0

I'm trying to plot an stacked barplot inside an upset-plot using the ComplexUpset package. The plot I'd like to get looks something like this (where mpaa would be component in my example):

I have a dataframe of size 57244 by 21, where one column is ID and the other is type of recording, and other 19 columns are components from 1 to 19:

ID  component1   component2 ... component19   type
1   1            0              1             a
2   0            0              1             b
3   1            1              0             b

Ones and zeros indicate affiliation with a certain component. As shown in the example in the docs, I first convert these ones and zeros to logical, and then try to plot the basic upset plot. Here's the code:

df <- df %>% mutate(across(where(is.numeric), as.logical))
components <- colnames(df)[2:20]
upset(df, components, name='protein', width_ratio = 0.1)

But unfortunately after thinking for a while when processing the last line it spits out an error message like this:

Error: cannot allocate vector of size 176.2 Mb

Though I know I'm using the 32Gb RAM architecture, I'm sure I couldn't have flooded the memory so much that 167 Mb can't be allocated, so my guess is I am managing memory in R somehow wrong. Could you please explein what's faulty in my code, if possible.

I also know that UpsetR package plots the same data, but as far as i know it provides no way for the stacked barplotting.

krassowski
  • 13,598
  • 4
  • 60
  • 92
mrbelyash
  • 99
  • 1
  • 7
  • Use `sample_n`, to just get a reduced version of your data, and try it out. – Mossa Nov 15 '21 at 12:40
  • @Mossa `sample_n` works for some values, and it plots the graph, but should the sampled data be used? Is there a way to use the whole dataset? – mrbelyash Nov 15 '21 at 12:46
  • This is not a solution, but a tool for you to investigate what is going on. Maybe you have a factor with non-existing factors, or you have values that are unsubstantial, etc. If the call works with sample_n, you get to see what isn't important in the plot, and you get to tweak that. – Mossa Nov 15 '21 at 12:49
  • Alright, thank you – mrbelyash Nov 15 '21 at 12:50
  • Dig through the backtrace (the result of running `traceback()` after encountering the error) to identify the call in the stack that is trying to allocate that vector. It make also help you to read through `` ?`Memory-limits` ``. – Mikael Jagan Nov 15 '21 at 13:16
  • Try increasing `min_size` or `min_degree` arguments of `upset()` to only show interesting intersections. The number of combinations in proteomics data can be huge. – krassowski Nov 16 '21 at 07:00

1 Answers1

0

Somehow, it works if you:

  1. Tweak the min_size parameter so that the plot is not overloaded and makes a better impression
  2. Making the first argument of ComplexUpset a sample with some data also helps, even if your sample is the whole dataset.
mrbelyash
  • 99
  • 1
  • 7