0

I am analyzing monthly observations of water input (rainfall) and output (evaporation) at a given location.

I need to plot time series of both rainfall and evaporation, shading the area between the data points with varying colors according to which line is above the other.

This is what I have:

library(ggplot2)
library(reshape2)

dat1 <- structure(list(month = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
                                10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
                                12L), value = c(226.638505697305, 186.533910906497, 141.106702957603, 
                                                93.4474376969313, 134.58903301495, 77.6436398653559, 77.301864710113, 
                                                69.7349071531699, 109.208227499776, 165.197186758555, 156.057081859175, 
                                                168.342059689587, 136.34266772667, 119.741309096806, 120.395245911241, 
                                                98.1418096019397, 72.4585192294772, 59.6209861948614, 69.6993145911677, 
                                                97.1585171469416, 118.357052089691, 132.74037278737, 139.141233379528, 
                                                146.583047731729), var = c("rainfall", "rainfall", "rainfall", 
                                                                           "rainfall", "rainfall", "rainfall", "rainfall", "rainfall", "rainfall", 
                                                                           "rainfall", "rainfall", "rainfall", "evaporation", "evaporation", 
                                                                           "evaporation", "evaporation", "evaporation", "evaporation", "evaporation", 
                                                                           "evaporation", "evaporation", "evaporation", "evaporation", "evaporation"
                                                )), row.names = c(NA, -24L), class = "data.frame")

ggplot(dat1, aes(x=month,y=value, colour=var)) +
  geom_line() + 
  scale_color_manual(values=c("firebrick1", "dodgerblue")) +
  theme_bw(base_size=18)

which yields the following graph (with little edits to show what I'm trying to achieve):

enter image description here

My initial attempt to fill the areas between the lines was based on this SO answer:

dat2 <- data.frame(month=1:12,
                   rainfall=dat1[dat1$var=="rainfall",]$value,
                   evaporation=dat1[dat1$var=="evaporation",]$value)
dat2 <- cbind(dat2, min_line=pmin(dat2[,2],dat2[,3]) ) 

dat2 <- melt(dat2, id.vars=c("month","min_line"), variable.name="var", value.name="value")

ggplot(data=dat2, aes(x=month, fill=var)) +
  geom_ribbon(aes(ymax=value, ymin=min_line)) +
  scale_fill_manual(values=c(rainfall="dodgerblue", evaporation="firebrick1"))

enter image description here

However, it's not quite what I need.

How can I achieve the desired result?

thiagoveloso
  • 2,537
  • 3
  • 28
  • 57

1 Answers1

1

The reason you're getting the wrong shading is probably because the data is a bit on the coarse side. My advice would be to interpolate the data first. Assuming dat1 is from your example.

library(ggplot2)

# From long data to wide data
dat2 <- tidyr::pivot_wider(dat1, values_from = value, names_from = var)

# Setup interpolated data (tibble because we can then reference column x)
dat3 <- tibble::tibble(
  x = seq(min(dat2$month), max(dat2$month), length.out = 1000),
  rainfall    = with(dat2, approx(month, rainfall, xout = x)$y),
  evaporation = with(dat2, approx(month, evaporation, xout = x)$y)
)

Then, we need to find a way to identify groups, and here is a helper function for that. Group IDs are based on the runs in run length encoding.

# Make function to identify groups
rle_id <- function(x) {
  x <- rle(x)
  rep.int(seq_along(x$lengths), x$lengths)
}

And now we can plot it.

ggplot(dat3, aes(x)) +
  geom_ribbon(aes(ymin = pmin(evaporation, rainfall), 
                  ymax = pmax(evaporation, rainfall),
                  group = rle_id(sign(rainfall - evaporation)),
                  fill = as.factor(sign(rainfall - evaporation))))

Created on 2021-02-14 by the reprex package (v1.0.0)

teunbrand
  • 33,645
  • 4
  • 37
  • 63
  • Thanks for the answer! The shaded areas look right now, but... I don't quite get the usage of the `rule_id` and `sign` functions... For example, how can I customize colors for the positive and negative areas? – thiagoveloso Feb 14 '21 at 23:11
  • The `sign()` function just returns -1, 0 and 1 depending on the sign of the input. The `rle_id()` function makes a group along runs, for example `c(-1,-1,0,1,1,1,-1)` will be `c(1,1,2,3,3,3,4)`. If you want to know more, the term to google is 'run length encoding'. The fill colours are customisable like any ggplot fill with the `scale_fill_*()` functions, for example `scale_fill_manual(fill = c("red", "blue"))`. – teunbrand Feb 14 '21 at 23:16
  • Sorry that should be `scale_fill_manual(values = c("red", "blue"))`, was too late to edit. – teunbrand Feb 14 '21 at 23:23