3

EDIT: with description cleared up and code example, plots added.

I have a data set with locations of several animals.

I created a grid of location scatter plots for every single animal. Because the x y of plot are distance, I want to keep x y in same scale for each plot itself (so there is no distortion in distance) and across plots (so I can compare different plots with same scale).

Facet is a natural choice for this and it works with coord_fixed(). However it became more complex when there are outliers in the data (which could be errors). I modified @Mark Peterson great answer to add some outlier points.

set.seed(8675309)
df <-
  data.frame(
    x = runif(40, 1, 20)
    , y = runif(40, 100, 140)
    , ind = sample(LETTERS[1:4], 40, TRUE)
  )
# add some outliers to stretch the plot
outliers <- data.frame(x = c(-100, 30, 60,-50),
                       y = c(20, 200, -100, 500),
                       ind = LETTERS[1:4])
df <- rbind(df, outliers)

ggplot(df , aes(x = x, y = y)) +
  geom_point() +
  facet_wrap(~ind) +
  coord_fixed()

This is what we got. facet with outlier

1.facet plot with coord_fixed(): consistent scales, aligned axes

This plot satisfied the scale ratio requirement and the scale consistent requirement, it also have all axes aligned, i.e. all xlim ylim are same. This is useful because it can show the relative position of each other.

I also want to check the patterns of each plot and compare them. Keeping the facet plot for relative position, I want to add another plot that have consistent scales but axes not aligned. If you draw each plot individually it will choose the xlim ylim to just cover the data without the alignment requirement. So I just need to draw each plot, arrange them with gridExtra or cowplot.

Then to deal with the outliers, our plan is to add a zoom button to zoom in all plots (the plots will be in a Shiny app).

We decided to center every plot to its centroid. Although this way there will be more space wasted, with all plot centered correctly, zooming them all will show the majority of all plot and they are still comparable in scales.

I had a function to adjust each plot to its median center, a little bit similar to @Mark Peterson code.

I knew median center is not well defined in 2D points, but it's good enough for my needs. Because I need to adjust each plot individually, I cannot use facet anymore.

expand_1D_center <- function(vec){
  center <- median(vec)
  new_diff <- max(center - min(vec), 
                  max(vec) - center)
  return(c(new_min = center - new_diff, 
           new_max = center + new_diff))
}
# given x y vectors, get new x y lim to make centroid center
expand_2D_center <- function(x_vec, y_vec){
  return(list(xlim = expand_1D_center(x_vec),
              ylim = expand_1D_center(y_vec)))
}
# plot each with center adjusted
id_vector <- sort(unique(df$ind))
g_list <- vector("list", length = length(id_vector))
for (i in seq_along(id_vector)) {
  data_i <- df[df$ind == id_vector[i], ]
  new_lim <- expand_2D_center(data_i$x, data_i$y)
  g_list[[i]] <- ggplot(data = data_i, aes(x, y)) +
    geom_point() +
    coord_fixed(xlim = new_lim$xlim, ylim = new_lim$ylim) 
}
grid.arrange(grobs = g_list, ncol = 2, respect=TRUE)

center adjusted

2. center adjusted plots, with xy scale right for each plot, but not consistent across plots.

I hope this is more clear now. My first post didn't state the problem clearly when I was focused on current problem and forgot the whole history, which are needed to explain our requirement.

@Mark Peterson answer seems solved this problem, I'll read the code further to verify.

Thanks!

EDIT: to give some context, I added the plots from the real data here:

the overview plots with all gulls in one plot, note there are some outliers stretched the plot

the overview plots

This is the facet plot, which is useful to have everything aligned.

facet

This is the individual plots with each scales right, not aligned across plots.

plots not adjusted

This one have each plot centered around the centroid. I plan to zoom in them all at the same time. The only problem is the scales are not consistent across plots.

enter image description here

EDIT: I tried @Mark Peterson code on my data, it cropped some points but the plots are consistent., probably because my data is with much bigger values so the original padding is not big enough.

Mark is using the max xrange across all plots for each plot, so every plot have same range. My code tried to fit every plot to their pattern, but to place them inside a grid with consistent scales will need to shrink the plot with biggest canvas, or padding the smallest plot. Setting the range of every plot to same actually have similar effect but is much simpler to implement.

dracodoc
  • 2,603
  • 1
  • 23
  • 33
  • 5
    It would be easier to help you if you provided some sort of [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data. Show the code you've tried. I'm not exactly clear on why facet won't work for you. Are you trying to have different ranges for the different plots? Why not just have all set on the maximal range rather than leaving margins? – MrFlick Feb 09 '17 at 22:04
  • It would be very helpful if you could provide an example of your plot code so others can troubleshoot it. But, it sounds like `coord_cartesian()` might solve your problem. – sc_evans Feb 09 '17 at 22:06
  • 1
    Can you not use facets? That would be best, with space='free', and coord_fixed() – baptiste Feb 09 '17 at 22:28
  • Do you refer to `xlim` - `ylim`? – pacomet Feb 10 '17 at 07:15
  • I know I should get some code example, just to simulate a good data example is not easy, and my own code have too much dependencies. I'll try to get one now. – dracodoc Feb 10 '17 at 14:08
  • @baptiste, I think you mean `scale = 'free'`. That will be conflict with coord_fixed(). And I found facet always try to make every plot in same size, that's inherent conflict with my requirement because every plot will need to be in difference size and aspect ratio. – dracodoc Feb 10 '17 at 14:13
  • i meant `space = 'free'` but it requires `scales = 'free'` to do something useful, and that breaks the aspect ratio unfortunately (arguably a bug, since `coord_fixed` doesn't hold its promise). – baptiste Feb 10 '17 at 20:35
  • @baptiste, thanks. I think `coord_fixed` is not going to work with facet since facet has its own set of rules and goal: make all plots same size at least in one axis. Even with space and scale both free, it will still align all plots first which is the purpose and difference between facet and arbitrary grid arrangement. – dracodoc Feb 11 '17 at 15:00
  • @dracodoc I understand the _technical_ reason for this failure, but there's not even a warning that `coord_fixed` is being totally ignored in this situation. – baptiste Feb 11 '17 at 20:13

1 Answers1

3

Alright, I think I have gotten my best guess at what you are asking, though I agree with @MrFlick that explictly sharing data would be a huge help to that.

If you had simple data with all of your animals on the same basic grid, I am guessing you wouldn't be asking (at least not the way you are). That is, given these data:

set.seed(8675309)
df <-
  data.frame(
    x = runif(40, 1, 20)
    , y = runif(40, 100, 140)
    , ind = sample(LETTERS[1:4], 40, TRUE)
  )

This straightforward facet_grid works:

ggplot(df , aes(x = x, y = y)) +
  geom_point() +
  facet_wrap(~ind) +
  coord_fixed()

to give this:

enter image description here

But, you said that facet_wrap solutions wouldn't work. So, I am guessing that you have data where each animal is in a different grid, like this (note, using dplyr here and much more below):

modDF <-
  df %>%
  mutate(x = x + as.numeric(ind)*10
         , y = y + as.numeric(ind)*20)

And that means that the above code (using modDF instead of df)

ggplot(modDF, aes(x = x, y = y)) +
  geom_point() +
  facet_wrap(~ind) +
  coord_fixed()

gives this:

enter image description here

which has a ton of wasted space and doesn't look great. So, I think you are asking how to handle data like these. For that, I think what you need to do is calculate the largest range (in each axis) and then generate that range centered on the data for each individual. For that, I am relying heavily on dplyr to group_by individual and calculate the minimum and maximum x/y locations. Then, I calculate a number of additional columns to calculate the midpoint of the data for each individual, the size of the range, and then where the range should extend to be set to the largest width/height needed and be centered on that individual's data. Note that I am also padding these a little bit so that I can set expand = FALSE when I implement the ranges.

getRanges <-
  modDF %>%
  group_by(ind) %>%
  summarise(
    minx = min(x)
    , maxx = max(x)
    , miny = min(y)
    , maxy = max(y)
  ) %>%
  mutate(
    # Find mid points for range setting
    midx = (maxx + minx)/2
    , midy = (maxy + miny)/2
    # Find size of all ranges
    , xrange = maxx - minx
    , yrange = maxy - miny
    # Set X lims the size of the biggest range, centered at the middle
    , xstart = midx - max(xrange)/2 - 0.5
    , xend = midx + max(xrange)/2 + 0.5
    # Set Y lims the size of the biggest range, centered at the middle
    , ystart = midy - max(yrange)/2 - 0.5
    , yend = midy + max(yrange)/2 + 0.5
    )

gives

     ind     minx     maxx     miny     maxy     midx     midy   xrange   yrange   xstart     xend   ystart     yend
  <fctr>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1      A 14.91873 29.53871 120.0743 157.6944 22.22872 138.8844 14.61997 37.62010 14.17717 30.28027 119.5743 158.1944
2      B 22.50432 37.27647 153.5654 179.0589 29.89039 166.3122 14.77215 25.49352 21.83884 37.94195 147.0021 185.6222
3      C 32.15187 47.08845 165.9829 195.0261 39.62016 180.5045 14.93658 29.04320 31.56861 47.67171 161.1945 199.8146
4      D 44.49392 59.59702 192.7243 214.5523 52.04547 203.6383 15.10310 21.82806 43.99392 60.09702 184.3283 222.9484

Then, I loop through each individual, generating the plot needed and setting the range to what was calculated for that individual. (You could use ggtitle instead of facet_wrap but I like the strip effect from facet_wrap.)

sepPlots <- lapply(levels(modDF$ind), function(thisInd){
  thisRange <-
    filter(getRanges, ind == thisInd)

  modDF %>%
    filter(ind == thisInd) %>%
    ggplot(aes(x = x, y = y)) +
    geom_point() +
    coord_fixed(
      xlim = c(thisRange$xstart, thisRange$xend)
      , ylim = c(thisRange$ystart, thisRange$yend)
      , expand = FALSE
    ) +
    # ggtitle(thisInd)
    facet_wrap(~ind)
})

Then, I use plot_grid from cowplot to arrange the plots together. Note that loading cowplot sets a theme. So, I am resetting the theme because I am not a huge fan of the one from cowplot

library(cowplot)
theme_set(theme_gray())

plot_grid(plotlist = sepPlots)

gives:

enter image description here

From there, you can play around with scales and axis labels as you see fit.

Mark Peterson
  • 9,370
  • 2
  • 25
  • 48
  • Thanks for the detailed answer. I will get some data and code posted and see if it works. I tried cowplot first but found it turned off the background grid and affected all other plots, so I chose gridextra instead. – dracodoc Feb 10 '17 at 14:19
  • As in my answer, `cowplot` sets its own custom theme at load; you can revert back using `theme_set` as I did here. If the data I have here matches your use case, feel free to use it directly in your question. – Mark Peterson Feb 10 '17 at 14:22
  • Thanks Mark, I updated my question with some code and pic. My original question didn't state all the history of my attempts so the question was not clear. I'll look at your code in detail to verify if it works with my need. Thanks for the detailed answer. – dracodoc Feb 10 '17 at 15:01
  • OK @Mark Peterson, I read your code and tested with my data. If I'm understanding correctly, you are using the max xrange across all plots for each plot, so every plot have same range. This is the difference between my code and yours. Your code actually works better and solved my problem. Thanks! – dracodoc Feb 10 '17 at 15:33
  • Yes, I believe you are understanding correctly. Each plot has the same number of units along the x-axis (and along the y-axis) set to fit the largest data range, but each plot is centered at a different location. So, scale/distance is consistent across plots, but the raw locations are not. – Mark Peterson Feb 10 '17 at 16:01