2

Hi I'm new to R and I must use it to make a venn diagram. I've been googling it for a while and all the examples I could find deal with binary variables. However I have 2 lists (well actually 2 csv files). The items in the list are just strings, like PSF113_xxxx. I must compare them to see what is unique to each and what is shared. How would I make a venn diagram out of this in R?

Also the files don't have the same number of things in them, one has slightly more then the other, which means the cbind function returns an error.

I've come up with this so far, but this just gets me an image with a circle named group 1 with a 1 inside and a 0 outside.

matLis <- list(matrix(A), matrix(B))

n <- max(sapply(matLis, nrow))
do.call(cbind, lapply(matLis, function (x)
     rbind(x, matrix(, n-nrow(x), ncol(x))))) 

x = vennCounts(n)
vennDiagram(x)

This is an example of the data

2 PSF113_0018
3 PSF113_0079
4 PSF113_0079a
5 PSF113_0079b

The numbering on the left isn't anything I've done, it added that when I imported the files into R from excel

head(A)
> head(A)
            V1
1 PSF113_0016a
2  PSF113_0018
3  PSF113_0079
4 PSF113_0079a
5 PSF113_0079b
6 PSF113_0079c

> head(b,10)
             V1
1  PSF113_0016a
2   PSF113_0021
3   PSF113_0048
4   PSF113_0079
5  PSF113_0079a
6  PSF113_0079b
7  PSF113_0079c
8   PSF113_0295
9  PSF113_0324a
10 PSF113_0324b
TheFoxx
  • 1,583
  • 6
  • 21
  • 28

1 Answers1

2

Your code still isn't quite reproducible because you haven't defined A or B. Here's a guide for a venn diagram in the package venneuler as I found it more flexible.

List1 <- c("apple", "apple", "orange", "kiwi", "cherry", "peach")
List2 <- c("apple", "orange", "cherry", "tomatoe", "pear", "plum", "plum")
Lists <- list(List1, List2)  #put the word vectors into a list to supply lapply
items <- sort(unique(unlist(Lists)))   #put in alphabetical order
MAT <- matrix(rep(0, length(items)*length(Lists)), ncol=2)  #make a matrix of 0's
colnames(MAT) <- paste0("List", 1:2)
rownames(MAT) <- items
lapply(seq_along(Lists), function(i) {   #fill the matrix
    MAT[items %in% Lists[[i]], i] <<- table(Lists[[i]])
})

MAT   #look at the results
library(venneuler)
v <- venneuler(MAT)
plot(v)

Edit: The head is very helpful in that it gives us something to work with. Try this approach:

#For reproducibility (skip this and read in the csv files)
A <- structure(list(V1 = structure(1:6, .Label = c("PSF113_0016a", 
    "PSF113_0018", "PSF113_0079", "PSF113_0079a", "PSF113_0079b", 
    "PSF113_0079c"), class = "factor")), .Names = "V1", 
    class = "data.frame", row.names = c("1", 
    "2", "3", "4", "5", "6"))

B <- structure(list(V1 = structure(1:10, .Label = c("PSF113_0016a", 
    "PSF113_0021", "PSF113_0048", "PSF113_0079", "PSF113_0079a", 
    "PSF113_0079b", "PSF113_0079c", "PSF113_0295", "PSF113_0324a", 
    "PSF113_0324b"), class = "factor")), .Names = "V1", 
    class = "data.frame", row.names = c("1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "10"))

run code from here:

#after reading in the csv files start here
Lists <- list(A, B)  #put the word vectors into a list to supply lapply
Lists <- lapply(Lists, function(x) as.character(unlist(x)))
items <- sort(unique(unlist(Lists)))   #put in alphabetical order
MAT <- matrix(rep(0, length(items)*length(Lists)), ncol=2)  #make a matrix of 0's
colnames(MAT) <- paste0("List", 1:2)
rownames(MAT) <- items
lapply(seq_along(Lists), function(i) {   #fill the matrix
    MAT[items %in% Lists[[i]], i] <<- table(Lists[[i]])
})

MAT   #look at the results
library(venneuler)
v <- venneuler(MAT)
plot(v)

The difference in this approach was that I unlisted the two data frames (if they're dataframes) and then turned them to character vectors. I think this should work.

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • A and B are just csv files that I have imported, literally they are A = open.csv(...). The example of the data I give what I get if I call either A or B. Thanks though. I'll try this out – TheFoxx Jul 30 '12 at 13:56
  • try `head(A, 10)` and `head(B)` – Tyler Rinker Jul 30 '12 at 13:58
  • I got an error message from your code saying that the data for the sort function must be atomic. Could you help? Like I said before when I call my data it's given in the form I showed in the OP – TheFoxx Jul 30 '12 at 14:20
  • 1
    @TheFoxx check this link for a reproducible example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example We can't help with what we don't understand. I asked you to use `head(A, 10)` and `head(B, 10)` but you haven't. When you ask for help and don't give proper info it's like going to the doctor and saying it hurts. Hurts tells him nothing. I can't reproduce your error because I never get past the first line in your code. – Tyler Rinker Jul 30 '12 at 14:32
  • I included Head(a) and b in the post. Sorry I didn't before, but I didn't before because it's exactly the same as what I gave before that – TheFoxx Jul 30 '12 at 14:36
  • That is useful, thanks. But A and B both have hundreds of entries, is there a way that doesn't involve me manually entering all of them? – TheFoxx Jul 30 '12 at 14:58
  • @TheFoxx read them in like you were before and where I have annotated `after reading in the csv files start here` is where you start running the code. – Tyler Rinker Jul 30 '12 at 15:03