1

This is my situation:

library(UpSetR)

movies <- read.csv(system.file("extdata", "movies.csv", package = "UpSetR"), header = TRUE, sep = ";")

upset(movies, sets = c("Action", "Adventure", "Comedy", "Drama", "Mystery",  "Thriller", "Romance", "War", "Western"), 
      order.by = "freq")

I would like to improve the plot by removing variables (genres) that are displayed alone, without any intersections with other variables.

How can I modify the code to remove these isolated variables as specified below?

enter image description here

Borexino
  • 802
  • 8
  • 26

1 Answers1

2

You can filter them out of the data before you draw the plot. For example

sets <- c("Action", "Adventure", "Comedy", "Drama", "Mystery",  "Thriller", "Romance", "War", "Western")

# keep only rows with more than 1 value
reduced_data <- movies[rowSums(movies[, sets]) > 1, ]
# or with dplyr...
# reduced_data <- movies %>% filter(rowSums(pick(all_of(sets)))>1)

upset(reduced_data, sets = sets, 
      order.by = "freq")

which gives you upset plot with no single groups

MrFlick
  • 195,160
  • 17
  • 277
  • 295