The upset plot tutorials on the documentation have this example with movies: https://upsetplot.readthedocs.io/en/stable/formats.html#When-category-membership-is-indicated-in-DataFrame-columns
I wanted to know, after creating data from memberships "Genre" and plotting how do I list the names of the movies as well?
In the plot, I want to print the list of movies at each intersection. So at intersection 48, I want to list the 48 movies.

- 57
- 11
1 Answers
In the example on the documentation page, this information is contained in the dataframe movies_by_genre
, which is defined as: movies_by_genre = from_indicators(genre_indicators, data=movies)
. Now, we can extract the required information from this data frame. We just need to make sure that the order of the boolean tuple of length 20, (True, False, ....., True) in the pandas Series object intersection
and the pandas Series object movies_by_genre.Genres
. I used a dict to map the order of columns. For reproducibility, the end-to-end python script is given below:
# ! pip install upsetplot
# ! pip install smartprint
from upsetplot import from_indicators
import pandas as pd
from upsetplot import UpSet
from smartprint import smartprint as sprint
def get_movie_list_at_intersection(u, movies_by_genre, col=0):
"""
Args:
u: result of the call UpSet(movies_by_genre, min_subset_size=15, show_counts=True)
movies_by_genre: result of from_indicators(genre_indicators, data=movies)
column number: 0 implies the first intersection with 48 elements
Returns:
list of movie names at column number col
"""
keys = list(u.intersections.index.names)
values = list(u.intersections.index[col])
# Fix the order of columns between movies df and the movies_by_genre_df
dict_ = dict(zip(keys, values))
column_names_in_df_movies_by_genre = movies_by_genre.Genre.index.names
mapped_boolean = [*map(dict_.get, column_names_in_df_movies_by_genre)]
movie_list = movies_by_genre.loc[tuple(mapped_boolean)].Title.tolist()
return movie_list
from upsetplot import from_indicators
import pandas as pd
from upsetplot import UpSet
movies = pd.read_csv("https://raw.githubusercontent.com/peetck/IMDB-Top1000-Movies/master/IMDB-Movie-Data.csv")
genre_indicators = pd.DataFrame([{cat: True
for cat in cats}
for cats in movies.Genre.str.split(',').values]).fillna(False)
movies_by_genre = from_indicators(genre_indicators, data=movies)
u = UpSet(movies_by_genre, min_subset_size=15, show_counts=True)
# For for the 4th intersection set, i.e. column number 3 we have the following,
# which outputs the corresponding list of length 15 movies
sprint (get_movie_list_at_intersection(u, movies_by_genre, 3))
sprint (len(get_movie_list_at_intersection(u, movies_by_genre, 3)))
Output:
get_movie_list_at_intersection(u, movies_by_genre, 3) : ['Nocturnal Animals', 'Miss Sloane', 'Forushande', 'Kynodontas', 'Norman: The Moderate Rise and Tragic Fall of a New York Fixer', 'Black Swan', 'The imposible', 'The Lives of Others', 'Zipper', 'Lavender', 'Man Down', 'A Bigger Splash', 'Flight', 'Contagion', 'The Skin I Live In']
len(get_movie_list_at_intersection(u, movies_by_genre, 3)) : 15
EDIT:
Upon clarification from OP, the list of names should be printed on the plot. So, we can follow the same method and put the text on the plots manually. I did the following:
- Modified the
_plot_bars()
function insideupsetplot.plotting.py
such that it allows us to add text from a parameterlist calledlol_of_intersection_names
;lol
stands for list of list. Additionally, I added analpha
parameter to reduce the transparency of the bars whenax.bar
is called; otherwise the text will not be visible. (alpha = 0.5) in the example below.
for (name, y), color in zip(data_df.items(), colors):
rects = ax.bar(x, y, .5, cum_y,
color=color, zorder=10,
label=name if use_labels else None,
align='center',alpha=0.5)
cum_y = y if cum_y is None else cum_y + y
all_rects.extend(rects)
############# Start of Snippet
# Iterate over each bar
for bar_num in range(len(y.tolist())):
bar = ax.patches[bar_num] # extract the bar
for counter in range(y.tolist()[bar_num]):
# insert text according to
ax.text( bar.get_width()/2 + bar.get_x(), bar.get_y() + bar.get_height() * \
counter/y.tolist()[bar_num] , self.lol_of_intersection_names[bar_num][counter], \
color='blue', ha='center', va='center', fontsize=0.5)
counter += 1
############# End of Snippet
self._label_sizes(ax, rects, 'top' if self._horizontal else 'right')
- Inserted the parameters into the object
u
of classUpset
so that it can be accessed inside the function_plot_bars()
as shown below:
u = UpSet(movies_by_genre, min_subset_size=15, show_counts=True)
lol_of_intersection_names = [] # lol: list of list
for i in range(u.intersections.shape[0]):
lol_of_intersection_names.append((get_movie_list_at_intersection(u, movies_by_genre, i)))
u.lol_of_intersection_names = lol_of_intersection_names
u.plot()
plt.savefig("Upset_plot.png", dpi=600)
plt.show()
Finally, the output looks as shown below:
However, given the long list of names, I am unsure of the practical importance of plotting like this. Only when I save the image in 600DPI, can I zoom in and see the names of movies.

- 1,160
- 8
- 13
-
The above code does not print the intersection list on the plot itself. I want to print the values on plot. – Uqhah Apr 18 '23 at 17:02
-
@Uqhah thanks for the clarification. I have updated the answer, please let me know if that works for you now? Also please feel free to let me know if you wish any part to be improved in clarity or reproducibility. – lifezbeautiful Apr 19 '23 at 04:49
-
Thanks for the quick response. The for loop is for range 15 how do I set it to automatically find out the number of bars and get those number of intersections? – Uqhah Apr 19 '23 at 18:36
-
@Uqhah, I have updated the answer to remove 15. It is basically the shape of the `intersection` dataframe. Please feel free to accept the answer if this solves your problem; and/or please feel free to let me know if something is lacking/ you think can be improved further – lifezbeautiful Apr 20 '23 at 08:19