0

I'm trying to create a venn diagram to help me inspect how many shared variables (species) there are between participant groups. I have a dataframe with dimensions 97 (participants) x 320. My first 2 columns are participant_id and participant_group respectively, and the rest 318 columns are the names of the species with their respective counts. I want to create a venn diagram which will tell me how many species are shared between all the groups. Here is a reproducible example.

participant_id <- c("P01","P02","P03","P04","P05","P06","P07","P08","P09","P10", "P11", "P12", "P13", "P14", "P15")
participant_group <- c("control", "responsive", "resistant", "non-responsive", "control", "responsive", "resistant", "non-responsive", "resistant", "non-responsive", "control", "responsive", "non-responsive", "control", "resistant")
A <- c (0, 54, 23, 4, 0, 2, 0, 35, 0, 0, 45, 0, 1, 99, 12)
B <- c (10, 0, 1, 0, 4, 65, 0, 1, 52, 0, 0, 15, 20, 0, 0)
C <- c (0, 0, 0, 5, 35, 0, 0, 45, 0, 0 , 0, 22, 0, 89, 50)
D <- c (0, 0, 45, 0, 1, 0, 0, 0, 56, 32, 0, 0, 40, 0, 0)
E <- c (0, 0, 40, 5, 0, 0, 0, 45, 0, 1, 76, 0, 34, 56, 31)
F <- c (0, 64, 1, 5, 0, 0, 80, 0, 0, 1, 76, 0, 34, 0, 32)
G <- c (12, 5, 0, 0, 80, 45, 0, 0, 76, 0, 0, 0, 0, 32, 11)
H <- c (0, 0, 0, 5, 0, 0, 80, 0, 0, 1, 0, 0, 34, 0, 2)
example_df <- data.frame(participant_id, participant_group, A, B, C, D, E, F, G, H)

I can see all the wonderful venn diagram packages out there, but I'm struggling to format my data correctly. I have started with:

example_df %>%
group_by(participant_group) %>% 
dplyr::summarise(across(where(is.numeric), sum)) %>%
mutate_if(is.numeric, ~1 * (. > 0))

So now I have an indication whether a species (A,B,C, etc) is present (1) or absent (0) within every group. Now, I want to see the overlap of species between the groups through a venn diagram (something like this https://statisticsglobe.com/venn-diagram-with-proportional-size-in-r ). However, I am a little bit stuck on what to do next. Does anybody have any ideas? I hope this makes sense! Thanks for your time.


When using the code from @Paul Stafford Allen, I get this diagram enter image description here but the goal here is to have something that shows shared presence/absence for species (A,B,C, etc) between groups irrespective of the counts.

  • Please clarify exactly what the sets are and what the elements are and what the rule is for determining if an element is in a set. – G. Grothendieck Oct 11 '22 at 13:36
  • Hello, sorry. In this case A,B,C, etc are the number of counts for a particular species for every sample. If this count is 0 it means, it wasn't present at all in the sample. After grouping the participants by group, ```example_df %>% group_by(participant_group) %>% dplyr::summarise(across(where(is.numeric), sum)) ``` If the count is still 0, it means that this species is not present in the group. The idea here is to see how many species are shared (overlapping) between groups, how many are unique etc. Does hits make more sense? – Svetlina Vasileva Oct 12 '22 at 00:52

1 Answers1

0

using

library(VennDiagram)
library(dplyr)
library(magrittr)

I managed the following start point:

 groupSums <- example_df %>% 
      group_by(participant_group) %>%
      summarise(across(where(is.numeric), sum))
    
 forVenn <- lapply(groupSums$participant_group, function(x) {
      rep(names(groupSums)[-1], times = groupSums[groupSums$participant_group == x,-1])
    })

 names(forVenn) <- groupSums$participant_group
    
 venn.diagram(forVenn, filename = "Venn.png", force.unique = FALSE)
Paul Stafford Allen
  • 1,840
  • 1
  • 5
  • 16
  • Hi Paul, thanks for your answer. Something is not working here. The code works, but the diagram I am getting doesn't make sense. Essentially in the example dataframe, I have 8 values (species). Through the Ven Diagram, I want to see how many of those are shared between the 4 groups, how many are present in only 1, etc. So the sum of numbers in the venn diagram would be equal to 8. – Svetlina Vasileva Oct 12 '22 at 01:01
  • Ah, I took an approach whereby the number of observations in each group is listed. If you just want the group names then a different approach would be needed. – Paul Stafford Allen Oct 12 '22 at 12:33
  • If you're not using the counts in any way, an Upset diagram (package `UpSetR`) would also be a way of displaying this data. – Paul Stafford Allen Oct 12 '22 at 12:39