1

I've got data on survival/sampling dates of over 500 dogs, each dog having been sampled at least once, and several having been sampled three or four times. For e.g.

Microchip_number    Date       Sampling_occasion

White notched fatso 20,11,2018 First
White notched fatso 28,12,2018 Second
White notched fatso 09,04,2019 Third
White notched fatso 23,10,2019 Fourth
Tuttu Jeevan        06,12,2018 First
Tuttu Jeevan        03,01,2019 Second
Tuttu Jeevan        04,05,2019 Third
Tuppy               22,10,2018 First
Tuppy               20,11,2018 Second
Tuppy               17,04,2019 Third
Tuppy               31,07,2019 Lost to study

I've managed to plot this in ggplot, but it's a very large image which requires zooming in and scrolling to view the sampling times of each individual dog.

Plot of outcomes for all dogs

I've found suggestions to split large dataframes based on a certain variable (e.g. month) or to use facet_wrap, but in my case, I don't have any such variable to use. Is there a way to split this large plot into multiple smaller plots that don't need to be zoomed in to view all the details clearly, such as below (without having to separately plot subsets of the dataframe)?

How I'd like each split/sub-plot to appear

This is the code I'm using

outcomes <- read_xlsx("Dog outcomes.xlsx", col_types = c("text", "date", "text"))

outcomes$Microchip_number<- as.factor(outcomes$Microchip_number)

outcomes$Sampling_occasion<- factor(outcomes$Sampling_occasion,
                             levels = c("First", "Second", "Third", "Fourth", "Lost to study", "Died"))

g<- ggplot(outcomes)

g + geom_point(aes(x = Date, y = Microchip_number, colour = Sampling_occasion, shape = Sampling_occasion)) +
geom_line(aes(x = Date, y = Microchip_number, group = Microchip_number, colour = Sampling_occasion)) +
theme_bw()
SR1614
  • 61
  • 6
  • 1
    Facets will still give you one big plot, just divided into smaller plots. You'll need to zoom in to see anything. I think the best way to go is to find a different visualization that does some aggregation, but if you really want all the dogs plotted in assorted little plots, run a for loop to plot 20 dogs at a time and save each plot in a file. – Gregor Thomas Dec 03 '19 at 17:05

2 Answers2

2

Thanks so much, Jrm FRL, the code to add the counter and subgroup columns was exactly what I needed! As Gregor mentioned, facet_wrap just made things more difficult to view, so I used a for loop using subgroup to plot 50 dogs per pdf page (or any other device). This is the code I used, and it's worked perfectly, although for some reason, the 'Microchip_number's are displaying in reverse sequence / alphabetical order (68481, 68480, 68479 etc.), despite being organised the other way round in the main dataframe 'outcomes'. Minor quibble, however! This makes it so much easier to visualise outcomes for specific individuals. Cheers!

outcomes2 <- outcomes %>% 
  mutate(counter = 1 + cumsum(c(0,as.numeric(diff(Microchip_number))!=0)), # this counter starting at 1 increments for each new dog
         subgroup = as.factor(ceiling(counter/50)))

pdf(file = "All_outcomes_50.pdf") # 
for (i in 1:length(unique(outcomes2$subgroup))) {
  outcomes2 %>%
    filter(subgroup == i) -> df

  ggplot(df) + geom_point(aes(x = Date, y = Microchip_number, colour = Sampling_occasion, shape = Sampling_occasion)) +
    geom_line(aes(x = Date, y = Microchip_number, group = Microchip_number, colour = Sampling_occasion)) + 
    theme_bw() -> wow
  print(wow)
}
dev.off()

New plot after using 'for' loop

SR1614
  • 61
  • 6
  • The order of the axis is expected. Lower values go at the bottom, higher values go at the top---regardless if the axis is numeric or discrete. `ggplot` almost never cares about the order of data in the data frame - for factors its all about the order of the levels. [See this FAQ if you want to change it](https://stackoverflow.com/q/42710056/903061). – Gregor Thomas Dec 04 '19 at 15:07
  • Thanks so much Gregor! `forcats::fct_rev()` did the job perfectly! – SR1614 Dec 05 '19 at 14:30
1

You can simply divide your dasatet in sub-groups containing the same number of dogs (e.g. 10). Add an intermediate counter column to overcome the small difficulty that there is not necessarly the same number of rows for each dog.

I would suggest :

library('dplyr')
outcomes <- outcomes %>% 
  mutate(counter = 1 + cumsum(c(0,as.numeric(diff(Microchip_number))!=0)), # this counter starting at 1 increments for each new dog
         subgroup = as.factor(ceiling(counter/10)))

You will obtain a new dataset with a factor subgroup column whose value is different every 10th dog. Then just add a + facet_wrap(.~subgroup) to your plot.

Hope this will help.

Jrm_FRL
  • 1,394
  • 5
  • 15
  • This is a nice idea - though from OP's data it looks like `microchip_number` is a misnomer, as it is a character or factor---so `diff` won't work on it. You could instead do `counter = as.integer(factor(Microchip_number))`. I'd also strongly advise that OP sort the data by something meaningful before adding this counter. – Gregor Thomas Dec 03 '19 at 17:03
  • Indeed, I was surprised to see `diff` working with a character column. Example : `data.frame(x = c("a", "a", "a", "b", "c")) %>% mutate(counter = cumsum(c(1,as.numeric(diff(x))!=0)))` – Jrm_FRL Dec 03 '19 at 17:10
  • Interesting--though that is working on a `factor` column, not a character column (set `stringsAsFactors = FALSE` to see an error on a character column. I'm still confused because `diff(factor(c("a", "a", "b")))` gives an error, somehow the `c(1, ...)` coerces the factor to numeric before `diff` is applied. – Gregor Thomas Dec 03 '19 at 17:15