UpSet – Visualizing Intersecting Sets

Understanding relationships between sets is an important analysis task. The major challenge in this context is the combinatorial explosion of the number of set intersections if the number of sets exceeds a trivial threshold. The most common set visualization approach – Venn Diagrams – don’t scale beyond three or four sets. To address this, we developed UpSet, a novel visualization technique for the quantitative analysis of sets, their intersections, and aggregates of intersections.

UpSet visualizes set intersections in a matrix layout. The matrix layout enables the effective representation of associated data, such as the number of elements in the intersections.

If you use an UpSet figure in a publication, please cite the original paper:
Alexander Lex, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, Hanspeter Pfister. UpSet: Visualization of Intersecting Sets IEEE Transactions on Visualization and Computer Graphics (InfoVis), 20(12): 1983--1992, doi:10.1109/TVCG.2014.2346248, 2014.
If you created an UpSet figure with UpSetR, please also cite the UpSetR paper:
Jake R. Conway, Alexander Lex, Nils Gehlenborg. UpSetR: An R Package For The Visualization Of Intersecting Sets And Their Properties Bioinformatics, 33(18): 2938-2940, doi:10.1093/bioinformatics/btx364, 2017.

UpSet Explained

UpSet plots the intersections of a set as a matrix, as shown in the following figure. Each column corresponds to a set, and bar charts on top show the size of the set. Each row corresponds to a possible intersection: the filled-in cells show which set is part of an intersection. Also notice the lines connecting the filled-in cells: they show in which direction you should read the plot:

Explaining the matrix approach in UpSet.

Here you can see examples of how these intersections correspond to the segments in a Venn diagram. The first row in the figure is completely empty – it corresponds to all the elements that are in none of the sets. The green (third) row corresponds to the elements that are only in set B, (not in A or C). The orange (fifth) row represents elements that are shared by sets A and B, but not with C. Finally, the last (violet) row represents the elements shared between alll sets.

Explaining the intersections in UpSet

This layout is great because we can plot the size of the intersections (the “cardinality”) as bar charts right next ot the matrix, as you can see in the following example:

Plotting intersection sizes with bars in UpSet.

This makes the size of intersections easy to compare.

The matrix is also very useful because it can be sorted in various ways. A common way is to sort by the cardinality (size), as shown in the following figure, but it’s also possible to sort by degree, or sets, or any other desired sorting.

Sorting by cardinality in UpSet

Finally, UpSet works just as well horizontally or vertically. Vertical layouts are better for interactive UpSet plots that can be scrolled, while horizontal layouts are best for figures in papers.

Horizontal layout in UpSet

These are the basiscs of UpSet! There’s a lot more than you can do with UpSet plots, such as visualize attributes of the intersections, or group intersections. Look at the Interactive UpSet page for details.

Comparing UpSet to Venn Diagrams

Venn diagrams are not suitable to visualize intersections of more than three or four sets. The figure below shows an example of a six-set venn diagram published in Nature that shows the relationship between the banana’s genome and the genome of five other species by visualizing which genes are shared between the plant species.

Six set banana venn diagram.

While this figure looks fun, it is not a useful visualization. Try to extract any information from it. It’s really hard to trace which intersection involves which sets. It’s not obvious which is the biggest intersection from the visualization – you have to read the labels one by one.

You might ask, how does the banana venn diagram look in UpSet? Here you go: UpSet Screenshot

It is a little hard to read because the figure is rather small. But we can simply remove the small intersections, and we get a nice plot that shows us the main features of the data:

UpSet Screenshot

Notice how easy it is to see trends: the vast majority of genes is shared between all plants, as highlighted in the next figure:

UpSet Screenshot

Similarily, the first three species (Oryza_sativa, Sorghum_bicolor, and Brachypodium_distachyon) seem to be highly related, as all of them are part of the top-three intersections. In contrast, the sixth species (Phoenix dactylifera) seems to be most different from the others, as it only again is part of the sixth-largest intersection.

UpSet Screenshot

Such an analysis is almost impossible with a Venn diagram! In summary, if you want to visualize intersections of two or three sets, use a Venn diagram, as they are widely known and well understood. For a set dataset with more than three (and less than ~30) sets use UpSet!

More Information

For more details on the UpSet concept please refer to the paper on UpSet and to the Interactive UpSet page.

For a discussion of when to use which kind of set visualization, refer to the Points of View: Sets and Intersections.

Frequently Asked Questions

  • How can I create high-resolution UpSet plots for a paper or other publication?

    There are three options:

    • If you prefer to use the interactive web-based version you can print an interactive UpSet plot to a PDF and edit the PDF with a vector editing software such as Adobe Illustrator.
    • You can create an exportable figure to generate a plot using a programming language such as R or Python.
    • You can create a static figure using, e.g., the R-Shiny versions of Upset.

    To explore all of these options, please refer to the versions page.

  • Can I show attributes of the intersections?

    Yes, all versions support visualizing attributes in some way.

  • Can I export the elements in a particular intersection?

    Yes, but to our knowledge, only the interactive UpSet 2 version supports this.