webster

goal

create an exploratory visualization of the dictionary learning algorithm Webster that correlates genes to various biological pathways

ROLE

Project Lead, Project Manager, Designer

TOOLS

Lucid Chart, Adobe XD

TIMESPAN

4 Months

what is webster?

Webster is a sparse dictionary learning algorithm that maps genes to biological functions.

...but what does that mean?

A sparse dictionary learning algorithm is basically a type of algorithm that is used for natural language processing and image compression.

It is used to create a network of relationships between items. In this case, those items are genes and biological functions. It does this by essentially ranking each gene according to each function.

data specifics

Two datasets were used to train this model:

Durochet Dataset

Testing dataset

DepMap Portal and Dataset

Applied dataset

DepMap is the largest cancer data portal in the world, and therefore one of the most widely referenced datasets, making its application crucial for validity of this algorithm.

Both of these datasets have gene knockout screens of cancer cells. A gene knockout is when a gene is cut out of the DNA with the purpose of observing downstream affects to the body due to the lack of said gene. These downstream effects are what help scientists figure out what genes play a part in what biological functions.

Typically, gene effects are looked at on a more granular level where you are looking only at a couple specific genes or a single function at a time.

This gives us our use case:

Webster is a new function of analysis where all genes and functions are taken into account. This allows scientists to find genes they didn't expect to play a part in a particular function.

UMAP innovation

UMAP stands for "Uniform Manifold Approximation and Projection." What it does is takes multi-dimensional data and presents it in 2D form as a scatter plot. This is useful for many different types of analysis and is widely used in cancer research and other biological analyses.

A 3rd dimension can be used, but because it is harder for the brain to interpret 3D position, it often is not advantageous to add this extra dimension as it can be confusing. In addition to this, it is often difficult to navigate in 3D and does not make for the best usability.

Our question was,

Can we use the 3rd dimension to add more context to the original 2D UMAP?

Our answer was,

Yes we can!

Here, we are isolating a single function on the z-axis. It allows for clearer interpretation of the genes involved in a given function and also gives more quantifiable values to the points because position along a single axis is a stronger communicator than saturation in the 2D UMAP. The next step in the process then is to make this data explorable by making the UMAP interactive.

site-mapping

After doing all that research and discovery, aka speaking with the scientist and learning about how this all works, we moved on to how we want users to interact with the data. We did this by defining the information architecture of the site and how we would present the interactions to users.

Above are all the major interactions we were considering when planning.

Let's break down the flow:

^ Upon entering the site, we decided it was important to limit the amount of information presented so as to not overwhelm the user. We decided to guide the user by having them filter through the datasets first so that there weren't millions of points on the screen at a time. This would not only help load-time, but also so the plot was more easily navigable.

^ After the filtering, 2 things would happen; The UMAP would populate based on the filtering and a list of small multiples of the functions would appear. The user would be able to now select points within the UMAP and also be able to select a small multiple to select a function.

^ Once a plot point or small multiple is selected, a couple things can happen.

Firstly, tertiary information about the selected point would be shown.

Secondly, if a function was selected, the user would be able to manipulate the UMAP to switch between the 2D and 3D views to toggle between the two views. While the UMAP is toggled into the 3D view, you would also still be able to select other points on the plot so as to be able to compare genes within the function without deselecting the function. In order to keep this functionality, we later decided it was important to only allow for the plot to be flipped into 3D when a function was selected through the small multiples rather than on the plot itself. This would allow for the user to not only compare genes of a selected function, but also to view tertiary information of other functions while currently on a previously selected function. This would help in comparisons.

When we were initially working through these options, we had thought about how we could involve compounds (drugs) into the plot as they are another point of comparison that would be useful to scientists. However we decided against adding them to the first iteration because of some API limitations. In order to include compounds into the plots, we felt that adding more color to the interface would be too confusing for the user. We thought about using shapes in place of color, but the API we were using - Scatterplot GL - would not allow for this. We even reached out to the original Google developers that made this API to see if there was a workaround, but it ended up being something that would take more time than we had initially planned, so we deemed it out of scope.

^ Another important feature is the search bar, where a function or gene can be searched for and will then be highlighted and selected within the plot. We also thought a search function could be of use for searching multiple points. This is something often used in scientific applications as it is common for scientists to have a list of specific genes they would like to compare to each other. It is commonly done by the user providing a list with each item separated by a comma and space.

There are other ways we thought about allowing for multiselects for comparisons of genes and functions. The user may want to select all the points in a given area, or possibly choose each point individually by eye. Apart from displaying the tertiary information for all of these points, it is often important for the user to be able to access the information they have curated to be able to work with the raw data so they can analyze it with other data they may have for their research. For this, we planned to add a button where the user can download the metadata of singular or multiple points.

wireframes

We made multiple wireframes to figure out how to optimize space and where it makes sense to put each component.

^ We decided to put the selection types and search bar inside the actual window of the plot because they directly affect the selection of the plot. We also played around with where to put the small multiples so that they would be in an accessible area of the screen. Also, we started playing with the idea of having multiple cards for tertiary information if multiple points were selected. Multiple selection didn't end up getting built for this prototype though, so we also didn't implement the multiple card feature.

mockups and style guide

^ As we got further into the design process, you can see a couple more rearranging of components here. We moved some more of the components to the upper corner of the plot panel, which allowed for more room to place the small multiples on the bottom left. We also fit in some extra components such as the lock and hide options for genes/functions/compounds in case users might want to look at different data types at a time.

You can also see that the card on the right looks significantly different than the cards in wireframes. This is due to some more information we received from the scientist about what tertiary information should be displayed. In this case, we decided to use some of the heatmaps generated from the algorithm that gives more insight to how the activity of the function across cell lines. These also did not end up getting put in the prototype because of the complication of either creating each heatmap on the fly, or storing thousands of svgs for each function and gene. This feature will hopefully be added in future releases of the tool.

^ Above is the style guide that I built out for our dev, Ben, working on building the prototype. I used some material design principles to make the elements feel more tangible, but generally kept things on the simpler side. Some elements such as the colors were kept similar to colors used in previous minisites (the pink and purple), and other colors were taken from the paper to keep the consistency between the site and the paper.

prototype

try out the prototype here!

^ As you can see, a lot has changed between the mockups and the prototype. If I were to do this all over again, I would like to build out the style guide in more detail and spend more time with the developer to advocate for the style more strictly. At the end of the day though, function over form is the main way we approached this problem, and generally something that can be iterated on over time.

takeaways

I learned a lot of things throughout this process, but I think there were two very important takeaways I got through this process. The first would be how to lead a team through the entire process of a project and how to plan accordingly.

The second takeaway was how to put the design process into action. With the help of my mentor Andrew, I went through a few iterations of defining my design process and ended up coming up with the process shown here.

contact me

email elena.cuadra.berg@gmail.com

linkedin Elena Berg

I currently live in Pittsburgh, PA :)

Product Designer