0

I have daily flow data in a dataset I've called "dat1_na".

It spans between ~1940 and ~2020 so there's 18,780 lines in this dataset.

str(dat1_na) is:

'data.frame':   18780 obs. of  9 variables:
 ...
 $ MLd    : num  96 34 34 20 34 34 52 34 34 26 ...
 $ Date   : Date, format: "1943-09-19" "1943-09-07" "1943-09-08" "1943-09-11" ...
 ...
 $ Climate: chr  "Dry" "Dry" "Dry" "Dry" ...

So it's a simple Line graph (hydrograph) showing MLd (the daily flow rate) against time which is no problem. However, I'm trying to shade the background using geom_rect according to the 'Climate' part of the dataset which only has 2 possible values: "Dry" and "Wet". The issue is that I can't get the background to show up properly. I know the data is being read right because if I tweak my code a bit I can see the dry years and wet years where they should be:

ggplot(dat1_na, aes(x=Date, y=MLd, xmin=Date, xmax=Date, ymin=0, ymax=6000)) + 
  geom_line(colour = "#231EDC") + 
  geom_rect(aes(colour=Climate), alpha=0.2) +
  theme_minimal() 

graph using aes(colour=Climate)

What I really want is it to be transparent and sit behind the line graph. I can't seem to get it working though. I've tried a few versions of code including things in the ggplot() statement, or the aes() statement, but nothing really works. I have code which I think should work, but nothing from the geom_rect shows up (except in the legend which looks correct).

ggplot(dat1_na, aes(x=Date, y=MLd, xmin=Date, xmax=Date, ymin=0, ymax=6000)) + 
  geom_line(colour = "#231EDC") + 
  geom_rect(aes(fill=Climate), linetype=0, alpha=0.2) +
  theme_minimal() 

graph using aes(fill=Climate)

I'm wondering if its to do with the number of rows in my data (~18,000) causing the geom_rect to be just too small and for only the outline being large enough to show up. The trouble with that is I can't get the outline to be transparent. I assume the code is drawing a rectangle for each row, either pink or green depending on the value of dat1_na$Climate.

Does anyone have any suggestions?

Cheers

Greg
  • 39
  • 6
  • It would be easier to help you if you provide [a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a snippet of your data or some fake data to reproduce your issue. If you want to post your data use `dput()`. If your dataset has a lot of observations you could do e.g. `dput(head(NAME_OF_DATASET, 20))` for the first 20 rows of data. – stefan Apr 04 '22 at 07:42
  • https://stackoverflow.com/questions/67334121/geom-ribbon-multiple-start-stop-timepoints – danlooo Apr 04 '22 at 07:50

1 Answers1

1

It's difficult to demonstrate without a reproducible example, so let's create one with the same column names and types as your own data:

set.seed(8)

dat1_na <- data.frame(MLd = 40 + cumsum(sample(seq(-5, 5), 100, TRUE)),
                      Date = sort(as.Date(sample(seq(-9601, 9601), 100, TRUE),
                                          origin = '1970-01-01')),
                      Climate = c('Dry', 'Wet')[round(1.5 + 
                                  cumsum(runif(100, -0.01, 0.01)))])
dat1_na

The key here is to create an little data frame for the rectangles, based on the start and end dates of changes in Climate

library(tidyverse)

rect_frame <- dat1_na %>%
  arrange(Date) %>%
  mutate(change = lag(Climate) != Climate, 
         change = c(TRUE, change[-c(1, nrow(.))], TRUE)) %>%
  filter(change) %>%
  mutate(End_Date = lead(Date))

Now, when we plot, ensure that we draw the rect layer first. It should be filled by the fill aesthetic rather than the color aesthetic, and the layer needs to be passed rect_frame as its data argument:

ggplot(dat1_na, aes(x = Date, y = MLd)) + 
  geom_rect(data = rect_frame, 
            aes(fill = Climate, xmin = Date, xmax = End_Date, 
                ymin = -Inf, ymax = Inf), alpha = 0.2) +
  geom_line(colour = "#231EDC") + 
  theme_minimal()

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thanks. OK that solution and the one danlooo linked to both created a new data frame with start and end dates for each climate status and then drew the geom_rect using that new dataframe rather than the hydrograph's dataframe. I think this confirms my suspicion that my code was trying to draw ~18000 little rectangles (which is why only the outline was showing up). – Greg Apr 04 '22 at 23:06