6

I figured out a way to accomplish this but it requires a lot of guesswork and all the Venn or Euler diagram packages seem to only allow you to place the total number of occurrences inside the circle.

The data:

name=c('itm1','itm2','itm3','itm4','itm5','itm6','itm7','itm8','itm9','itm0')
x=c(5,2,3,5,6,7,7,8,9,2)
y=c(6,9,9,7,6,5,2,3,2,4)
z=data.frame(name,x,y)

Plotting the points and labeling them:

plot(z$x,z$y,type='n')
text(z$x,z$y,z$name)

enter image description here

Manually placing the circles over the points:

par(new=T)
symbols(3,7,circles=2.5,add=T,bg='#34692499',inches=F)
symbols(6,6,circles=1.5,add=T,bg='#64392499',inches=F)
symbols(8,3,circles=2,add=T,bg='#24399499',inches=F)

enter image description here

So this is a real tedious process of giving each item an x and y coordinate and then guessing where to place the circles and what radius to give them.

Ideally I would like to use the dataset I initially had which looks like this:

cat1=c('itm2','itm3','itm0')
cat2=c('itm1','itm4','itm5','itm6')
cat3=c('itm6','itm7','itm8','itm9')

And then just assign the points into the right circle. Is there a better way of doing this?

Technophobe01
  • 8,212
  • 3
  • 32
  • 59
thequerist
  • 1,774
  • 3
  • 19
  • 27
  • so the x and y values where points are plotted don't matter to you? How did you determine the circle size? – MrFlick Jun 07 '17 at 03:06
  • I just manually placed them going back and forth between plot and source. The x and y values don't matter so long as the ones going in the same circle are bunched together. After I plotted the points, I made the circles in the same tedious manner by seeing if the circle encompassed the points that are supposed to be within it, and if not I changed its coordinates and size accordingly. Then I moved some of the points around again so that they looked better. It's not much of an issue for the sample of 10 items, but my data has more and I'm sure other people have larger data sets as well. – thequerist Jun 07 '17 at 03:14
  • 1
    Do you want to acknoledge overlapping regions when placing points (=> item 6 must be in the overlap region from cat3 and cat2)? In that case, you might want to incorporate R`s spatial packages. – lukeA Jun 07 '17 at 09:14
  • @lukeA Sorry for the late response, was busy with work. Thanks for pointing me in the direction of spatial but I would need to have either coordinates or have each item associated with one category. As far as I know, no spatial package or GIS allows a point to exist within the boundaries of two polygons. However, you made me think of this in a different way and I decided to use the igraph package to place the items almost where they should be and then I just have to draw circles or ellipses over them. Not perfect but better than what I had before. – thequerist Aug 23 '17 at 16:56
  • 4
    maybe useful https://stackoverflow.com/questions/25019794/venn-diagram-with-item-labels?answertab=votes#tab-top – user20650 Oct 24 '17 at 13:45
  • @user20650 That does exactly what I am looking for but unfortunately it limits that functionality to two data sets. In the example above I have three sets and in my real world data I have six. – thequerist Oct 29 '17 at 16:43
  • @thequerist ; yes, i realised after i posted comment. I hoped RAM package did something clever which you could tweak, but unfortunately it doesn't - it just extended the code from my answer - however did you try [the code from Scott's comment](https://stackoverflow.com/questions/25019794/venn-diagram-with-item-labels?answertab=votes#comment63435226_25027009)? – user20650 Oct 29 '17 at 16:50
  • @thequerist Can you clarify the why you mapped the first circle [Green] as you did in the example? What sets are you mapping into each circle and why? That will help provide an automated answer. Just making sure I understand your Venn mapping logic. i.e. What triggers you to include in each circle? – Technophobe01 Oct 29 '17 at 16:53
  • oh, and just checked venn.diagram only accepts up to 5 groups (not 6 ) – user20650 Oct 29 '17 at 17:01
  • @Technophobe01 My circles are wrong in that example as is the placement of the labels. Where the labels and circles wind up on the plot is irrelevant. What matters to me is that the items in each category wind up within a circle or ellipsis of that category. However, some items are in more than one category and thus will need to be inside more than one circle. Here is an image that shows what I am going for: http://www.learnnc.org/lp/media/authors/walbert/venn/animals-10.png where the animals are the items and the characteristics are the categories. – thequerist Oct 29 '17 at 17:36
  • @user20650 Yes, I did, and that did add a new set of labels in a new area adjoining one of the ellipses, however it also repeated some of the items already existing on the plot which is not what I am going for. I tinkered with the code a bit, but no luck. I can see why venn.diagram has a limit as to how many groups it is willing to take on. Once you get up there it gets really difficult to maintain circles or ellipses as the grouping visual and might have to gerrymander the region if there are too many variables sharing groups. – thequerist Oct 29 '17 at 18:21
  • @thequerist ; yup I think that's the main point - it will get tricky with more groups. – user20650 Oct 29 '17 at 18:35
  • @thequerist Did I address your problem? – Technophobe01 Nov 02 '17 at 01:14
  • @Technophobe01 Unfortunately, I was looking for a Venn diagram solution, I am aware however that the more groups there are the more difficult it is, but the groups I had would allow for a non-messy Venn. – thequerist Jun 07 '18 at 02:10

2 Answers2

3

My sense, based on the thread discussion is to recommend using the UnSetR R package?

OK, why?

My personal feeling is that if we have more than five or seven groups the Venn diagram approach breaks down. For an overview of the various options available in this context I recommend you review:

the other useful website in my view is:

together they give good coverage of the options available.

Thus, my sense is that the core challenge here is the combinatorial explosion of the number of set intersections if the number of sets exceeds a trivial threshold. So how to address?


Proposed Solution UnSet

Well, UnSet is focused on creating a task-driven aggregate view of the data relationships, it communicates the size and properties of aggregates and intersections. For me at least this seems a better way - it is a recommendation.

That and at the very least an alternate approach - I hope it helps.

UnSet Reference Materials:

UnSetR Vignettes

There are currently four vignettes that explain how to use the features included in the UpSetR package:

Unset Movie DataSet Example 1

if (!require(UpSetR)) install.packages("UpSetR")

movies <- read.csv(system.file("extdata", "movies.csv", package = "UpSetR"), 
                   header = T, sep = ";")

upset(movies, nsets = 6, number.angles = 30, point.size = 3.5, line.size = 2, 
      mainbar.y.label = "Genre Intersections", sets.x.label = "Movies Per Genre", 
      text.scale = c(1.3, 1.3, 1, 1, 2, 0.75))

enter image description here

Unset Movie DataSet Example 2

upset(movies, sets = c("Action", "Adventure", "Comedy", "Drama", "Mystery", 
                       "Thriller", "Romance", "War", "Western"), mb.ratio = c(0.55, 0.45), order.by = "freq")

enter image description here

Technophobe01
  • 8,212
  • 3
  • 32
  • 59
0

If you don't mind doing this manually, you can speed the process up a lot by using locator:

points <- locator(2)
# click first at the circle centre, then somewhere on the circle edge
symbols(points$x[1], points$y[1], 
  circles = sqrt(sum(points$x - points$y)^2), 
  add = TRUE, bg = alpha('orange', .2), inches = F)