1

I have a data set that is discrete (ie. all integers) and many of the points overlap exactly on a 2D plot, so I thought I would plot it as a contour or coloru data points according to number of overlapping points (plotting according to size does not work as I just get one giant circle bigger than all other circles).

I am aware that there are a number of posts that already addresses my question, and I have tried their methods.

For instance, using this post and this post, I tried to do a heat overlay. However, my contour lines cover a very small region of the plot as seen here. I originally thought that just means there were no overlap except in that middle region, but then using the jitter function I could see that there was lots of overlap in data points that were not included by the coutours as seen here. I then tried this solution where overlapping points are represented by different colours, I followed the instructiosn step by step except with my own data, but when I checked for the density of each data point on my own (ex. counting all data points where x = 2 and y =-1), the calculated density was incorrect.

When I try it with smaller data subsets of my dataset the contour plot seems to work. I know we are not supposed to attach files on here, but if anyone is interested I could add a link of my data (it's just an x and y column, about 20000 rows) Wondering if anyone has any insight as to what I'm doing wrong? Thanks!

Edit: here is the output from the dput(head(data,n=100))

structure(list(`77-57` = c(0, 1, 2, 1, -1, 0, 0, 0, 0, 0, 0, 
0, 1, 0, 1, 2, 3, 1, 2, 4, 3, -1, 1, -1, -1, -1, -1, -1, -38, 
-1, 1, -1, -1, 0, 0, 0, 0, -3, 0, 0, -1, -1, 0, -1, -1, 0, -2, 
2, 1, 1, 1, 2, 0, 1, 0, 0, -1, 1, 2, 1, 0, 0, 0, 1, 0, 1, 0, 
0, 0, 0, 1, 0, 0, 1, 1, 4, 1, 1, 0, 0, 2, 0, -1, 1, 1, -1, -1, 
1, -3, 2, 6, 3, 3, 5, 2, 2, 2, 0, 0, 0), `308-77` = c(2, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, -1, 0, 0, -4, -4, 0, 0, 
0, 0, 0, 0, 0, 17, 1, 0, 1, 1, 0, 0, 0, 0, 3, 1, 0, 0, 2, 0, 
2, 2, 1, 1, -2, 0, 0, 0, 0, 0, -2, 0, 0, 0, -2, -1, 0, 1, 1, 
1, -2, 1, -1, 0, 0, 0, 0, -1, 1, 1, -1, -1, 1, 0, 0, 0, 0, 0, 
3, 2, 0, 0, 0, 0, -1, -6, 0, 1, -1, -1, 0, -2, -2, -2, 1, 1, 
1)), .Names = c("77-57", "308-77"), row.names = c(1L, 2L, 3L, 
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 22L, 
23L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 38L, 39L, 40L, 
41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 52L, 53L, 55L, 
59L, 60L, 61L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 72L, 73L, 74L, 
75L, 76L, 77L, 78L, 79L, 80L, 81L, 83L, 84L, 85L, 86L, 87L, 88L, 
89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 
101L, 103L, 104L, 105L, 106L, 107L, 108L, 109L, 110L, 111L, 112L, 
113L, 114L, 115L, 116L, 117L, 118L, 119L, 120L), class = "data.frame")
Community
  • 1
  • 1
ehze
  • 11
  • 2
  • 1
    Run `dput(head(yourdata, n = 100))` so we have sample data to work with. Also are you tied to using points? This is a textbook application of a heatmap. – Zach Nov 01 '16 at 21:44
  • 1
    The other package that might be useful is hexbin. The hexbin.plot function does a nice job and I think there is even a ggplot version of it. Oh nevermind. I see that you already saw that option and for some reason (that you do not state) are not happy with it. – IRTFM Nov 01 '16 at 21:49
  • I did consider heatmap and hexbin but the individual points are important and I will be looking at the biological significance for individual points later on and will need to highlight individual points in the plot for a presentation. So a scatter plot is more aesthetically pleasing and makes more sense because I am looking at those data points in relation to other data points, and @zach I have updated my question with the dput output, but like I said with a smaller subset the countour plot option seems to work – ehze Nov 01 '16 at 21:59

0 Answers0