3

I have a large data file with a little over 1000 points. It is a large merged file from 23 other .csv files (called alltraj)

It is a format that looks like this:

    Track   X1        X         Y
1    Point    1 148.5000 306.83333
2    Point    2 149.8333 306.83333
3    Point    3 151.8333 307.16667
4    Point    4 152.5000 308.16667
5    Point    5 156.1667 309.16667
6    Point    6 159.1667 311.16667
7    Point    7 163.1667 311.83333
8    Point    8 166.5000 313.50000
9    Point    9 170.5000 316.16667
10   Point   10 177.1667 321.50000

where X1 is the time step, and X and Y are the positions of a fish.

I am trying to make a heatmap of the frequencies of my X vs Y trajectories using the following code:

(p <- ggplot(alltraj, aes(Y,X)) + 
    geom_tile(aes(fill = X1), colour = "white") + 
    scale_fill_gradient(low = "white",high = "steelblue"))

However, the heatmap comes up completely blank. And looks like this:

enter image description here

Can anyone tell my what I'm missing in my code that is making it come up blank? Thanks in advance!

EDIT: Here is a copy of the (unfortunately messy) first 50 lines of code that I am trying to run when I use head(alltraj, n=50)):

structure(list(Track = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Point", class = "factor"), 
X1 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
13L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 
26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 
38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 
50L, 51L), X = c(148.5, 149.8333333, 151.8333333, 152.5, 
156.1666667, 159.1666667, 163.1666667, 166.5, 170.5, 177.1666667, 
180.8333333, 183.5, 186.1666667, 191.8333333, 192.8333333, 
194.1666667, 195.8333333, 195.5, 196.8333333, 197.1666667, 
197.1666667, 197.1666667, 198.5, 198.5, 198.8333333, 198.5, 
197.5, 198.8333333, 199.5, 199.5, 199.5, 199.8333333, 200.8333333, 
199.8333333, 201.1666667, 201.8333333, 202.5, 203.1666667, 
203.1666667, 203.1666667, 204.5, 204.5, 204.5, 204.5, 204.8333333, 
203.8333333, 203.8333333, 204.8333333, 206.1666667, 206.5
), Y = c(306.8333333, 306.8333333, 307.1666667, 308.1666667, 
309.1666667, 311.1666667, 311.8333333, 313.5, 316.1666667, 
321.5, 323.8333333, 325.8333333, 326.1666667, 327.5, 332.1666667, 
338.1666667, 341.5, 346.5, 351.1666667, 355.8333333, 360.1666667, 
364.1666667, 368.5, 371.8333333, 375.1666667, 376.5, 381.8333333, 
385.8333333, 389.5, 392.8333333, 395.8333333, 400.1666667, 
405.1666667, 408.8333333, 413.5, 417.1666667, 420.8333333, 
424.8333333, 427.8333333, 429.8333333, 433.1666667, 434.5, 
435.1666667, 435.1666667, 436.8333333, 436.8333333, 437.5, 
438.8333333, 439.8333333, 440.1666667)), row.names = c(NA, 
50L), class = "data.frame")

It seems as if the scale is off in my heatmap. When I use the first 10 lines I get nicely sized points, but when I do the first 20, the points get smaller. When I do 30 they're even smaller. At 50, they're pretty much unreadable. How do I make the size of the points bigger?

  • 2
    I have copied your data and code and executed it. It produced a plot with tiles, where you would expect them. I do see that your X-axis starts from 200, and you do not have any observations with `x > 200`. – KoenV Dec 18 '18 at 14:17
  • 1
    off-topic: if you intend to track the positions of the fish over time you might want to try `geom_path`. Something like `ggplot(alltraj, aes(Y,X)) + geom_path() + geom_point(aes(fill = X1), size = 2, shape = 21)`. You could also map `size` to `X1` in `geom_point`. – markus Dec 18 '18 at 14:31

1 Answers1

3

Your code produces the following plot:

alltraj <- readr::read_table("Track   X1        X         Y
Point    1 148.5000 306.83333
Point    2 149.8333 306.83333
Point    3 151.8333 307.16667
Point    4 152.5000 308.16667
Point    5 156.1667 309.16667
Point    6 159.1667 311.16667
Point    7 163.1667 311.83333
Point    8 166.5000 313.50000
Point    9 170.5000 316.16667
Point   10 177.1667 321.50000")

library("ggplot2")

p <- ggplot(alltraj, aes(Y,X)) + 
    geom_tile(aes(fill = X1), colour = "white") + 
    scale_fill_gradient(low = "white",high = "steelblue")

p

heatmap not empty

The heatmap is not empty. This likely means your reproducible example does not actually reproduce the issue... What happens when you run the code above?

Empty plots typically happen when you forgot to add a layer (i.e. you just called ggplot but no geom_XXX). This also happens when you think you have added a layer but forgot the + sign...

For example, you'll end up with an empty plot if you run the following (note the missing + at the end of the first line):

p <- ggplot(alltraj, aes(Y,X)) 
    geom_tile(aes(fill = X1), colour = "white") + 
    scale_fill_gradient(low = "white",high = "steelblue")

Edit after investigation

Using geom_tile creates tiles at the X and Y position you specify. Is implies your X and Y are already arranged in a grid! It is not the case here, and you end up with minuscule tiles (with width the smallest dX and height the smallest dY).

The solution is to use geom_bin2d which first bins your data (i.e. creates a coarse grid) and only then plots.

On your small example, the difference is not obvious but already the tiles are bigger. You can pick the most appropriate binwidth.

ggplot(alltraj, aes(x=Y, y=X)) + 
    geom_bin2d(colour = "white", binwidth=10)

heatmap with geom_bin2d

However that does mean that you can't use X1 as fill (does not make sense in a binned context). If you need to plot X1, it indicates you are using the wrong geom and a heatmap may not be appropriate. Markus suggest in the comments that you may be interested to use geom_path.

asachet
  • 6,620
  • 2
  • 30
  • 74
  • I get the same map as you if I copy your code. The issue is that I have 1000+ rows in my table so I can't use read_table and copy my data in. How can I compile them in a way that this will work for all of my data points? – Chandler Nelson Dec 18 '18 at 14:27
  • @ChandlerNelson Try to produce a reproducible example - I have no idea why your plot ends up empty, especially since on a small example it does seem to work. Did you explicitly limit your x and y axis? Also, you're aware you're plotting the X variable on the y-axis and vice versa? – asachet Dec 18 '18 at 14:31
  • @antoine-sac Add `+ ylim(c(100, 700)) + xlim(c(100, 500))` to your plot and you'll get an idea of OP's plot. My guess is that there are not enough observations. – markus Dec 18 '18 at 14:37
  • 1
    @markus: I have a good idea of what OP's plot look like - this is actually why I ask him if they explicitly limited the axis range. Re the number of observation: according to the post, there is at least 10 observations in the dataset so there is no reason for it to be empty. I don't think more can be done until OP provides a reproducible example. – asachet Dec 18 '18 at 14:40
  • @antoine-sac sorry I am very very new to both R and this website. I'm not sure how to make a reproducible example here. I think that it would work if I could read in the data set in the same way that you did but only if it's adjusted for a larger size. Do you know how I could do that? – Chandler Nelson Dec 18 '18 at 14:46
  • Try to isolate a subset of data that reproduces the error. For example, use `head(alltraj)` instead of `alltraj`: this will make the plot using only the first few lines of data. If the plot is empty, then share `head(alltraj)` with us. If the plot works, then try to "make it fail" by using more data with `head(alltraj, n=50)` (will use the first 50 lines). As soon as you get your empty plot, share the data and code you actually used. You can use `dput(head(alltraj))` to get data in shareable (albeit ugly) format. – asachet Dec 18 '18 at 14:51
  • See also https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and https://community.rstudio.com/t/faq-whats-a-reproducible-example-reprex-and-how-do-i-do-one/5219 – asachet Dec 18 '18 at 14:52
  • @antoine-sac okay so I did what you said and I think one of my issues is the scale. I used head(alltraj) and it came out okay so then I started increasing n and noticed that the points just got smaller and smaller. What is the easiest way for me to share the output for dput(head(alltraj, n=50))? – Chandler Nelson Dec 18 '18 at 15:01
  • I see - well looks like you figured out the problem: `geom_tile` is not adapted to your data as it produces minuscule tiles. Try `geom_bin2d` to create a 2D heatmap. The difference is that `geom_bin2d` uses `stat_bin2d` which bins your data (like a 2D histogram) instead of literally plotting the `x` and `y` values. – asachet Dec 18 '18 at 15:07
  • @antoine-sac okay we're getting closer to what I need. I think I can figure it out from here (fingers crossed). Thank you for bearing with me! – Chandler Nelson Dec 18 '18 at 15:13
  • @ChandlerNelson You're welcome, I edited my answer to reflect our comments. Good luck :) – asachet Dec 18 '18 at 15:18
  • @ChandlerNelson: If you have a lot of data, an alternative to a proper heatmap is to plot large, transparent points. It is a bit more computationally expensive but it can look really good. For example `geom_point(alpha=0.1, size=3)`. One advantage is that you can color the points which is nice if you have several trajectories to plot. Combine with `geom_path` for optimal results! – asachet Dec 18 '18 at 15:23