0

I'm trying to create a Venn Diagram in R to show whether certain tests on different machines are performed for all participants. In other words, I'm interested to see if certain tests for participants are performed on all three, two, one or none of the machines.

Here is an example of the data:

dat <- data.frame(id=1:30,
                  machine1 = sample(0:7, 30, replace =T),
                  machine2 = sample(0:3, 30, replace =T),
                  machine3 = sample(0:6, 30, replace =T))

These machine columns are sums of original columns for different tests. I have omitted those, but if easier they can be created with: machine1test1 = sample(0:1, 30, replace = T) etcetera

So, if a participant had 2 tests on machine 1 and 3 tests on machine 2 and 0 tests on machine 3, it should add a value of 5 in the Venn diagram for the overlap between machine 1 and machine 2.

I have tried to follow several examples online, but they all seem to take in string values for a Venn Diagram. This would require me to restructure the data, and I was hoping it's possible without converting to strings. I've tried to follow these example:

https://www.datanovia.com/en/blog/venn-diagram-with-r-or-rstudio-a-million-ways/ Making a venn diagram from a count table How to add count values in venn diagram for more than 6 sets? Create a Venn Diagram in R to represent rows with the same value from a dataframe

But none of those seem to fully apply, since they mostly apply to string values. Any help would be much appreciated!

1 Answers1

0

The simplest way I can think of would take advantage of how my nVennR package (link, the CRAN version is unavailable at this time) labels regions in a Venn diagram (as explained here). You would need an auxiliary function and row processing:

library(nVennR)
dat <- data.frame(id=1:30,
                  machine1 = sample(0:7, 30, replace =T),
                  machine2 = sample(0:3, 30, replace =T),
                  machine3 = sample(0:6, 30, replace =T))
toBin <- function(l){
  result <- 0
  bit <- 0
  for (v in rev(l)){
    if (v > 0){
      bpos <- bitwShiftL(1, bit)
      result <- result + bpos
    }
    bit <- bit + 1
  }
  return(result + 1)
}

nReg <- bitwShiftL(1, ncol(dat) - 1)
sets <- as.list(rep(0, nReg))
for (r in rownames(dat)){
  set <- toBin(dat[r, 2:ncol(dat)])
  sets[[set]] <- sets[[set]] + sum(dat[r, 2:ncol(dat)])
}

myV <- createVennObj(nSets = ncol(dat) - 1, sNames = colnames(dat[,2:ncol(dat)]), sSizes = sets)
myV <- plotVenn(nVennObj = myV)

And the result would be:

enter image description here

The key is toBin, where the values in each row get converted into a number whose binary representation is 1 where the value is higher than zero and 0 otherwise. With a couple of transformations, that is the Venn region (set in the code) where you want to store the sum of the values (sum(dat[r, 2:ncol(dat)). There is more information about nVennR at its vignette.

vqf
  • 2,600
  • 10
  • 16
  • I'm trying to wrap my head around the toBin function. How does the bitshift come into play? If you care to explain, or have any good source to dive into this, that would be much appreciated! Thanks again – Edifier8888 May 25 '23 at 13:22
  • Say you want a binary number like `1101`. That is `1000` (1 left-shifted three thimes) plus `100` (1 left-shifted twice) plus 1. That is what toBin does. The answer contains a link to http://degradome.uniovi.es/vqf/SCD.html where you can see how this relates to Venn diagram regions. – vqf May 25 '23 at 16:23