4

I have a question for you please :

My data :

    Nb_obs <- as.vector(c( 2,  0,  6,  2,  7,  1,  8,  0,  2,  1,  1,  3, 11,  5,  9,  6,  4,  0,  7,  9))
    Nb_obst <- as.vector(c(31, 35, 35, 35, 39, 39, 39, 39, 39, 41, 41, 42, 43, 43, 45, 45, 47, 48, 51, 51))
    inf20 <- as.vector(c(2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 4, 4, 3, 5, 4))
    sup20 <- as.vector(c(3, 4, 4, 4, 5, 4, 4, 5, 4, 4, 5, 5, 5, 6, 5, 6, 6, 5, 7, 6))
    inf40 <- as.vector(c(1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, 3, 3, 4, 3))
    sup40 <- as.vector(c(4, 5, 5, 5, 6, 5, 5, 6, 5, 5, 6, 6, 6, 7, 6, 7, 7, 7, 9, 7))
    inf60 <- as.vector(c(1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 2))
    sup60 <- as.vector(c(5, 6, 6,  6,  8,  7,  7,  7,  7,  7,  7,  7,  8,  9,  8,  9,  9,  9, 11,  9))
    inf90 <- as.vector(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1))
    sup90 <- as.vector(c(10, 11, 11, 11, 15, 13, 13, 14, 12, 13, 13, 13, 14, 17, 15, 17, 17, 16, 21, 18))

data <- cbind.data.frame(Nb_obs, Nb_obst, inf20, sup20, inf40, sup40, inf60 , sup60, inf90 , sup90)

My plot :

plot(data$Nb_obst, data$Nb_obs, type = "n",  xlab = "Number obst", ylab = "number obs", ylim = c(0, 25))

lines(data$Nb_obst, data$inf20, col = "dark red")
lines(data$Nb_obst, data$sup20, col = "dark red")

lines(data$Nb_obst, data$inf40, col = "red")
lines(data$Nb_obst, data$sup40, col = "red")

lines(data$Nb_obst, data$inf60, col = "dark orange")
lines(data$Nb_obst, data$sup60, col = "dark orange")

lines(data$Nb_obst, data$inf90, col = "yellow")
lines(data$Nb_obst, data$sup90, col = "yellow")

My question :

There are two things I'd like to do (and so I think it could be done by ggplot):

In the idea of the graph at the top, the "inf" and "sup" are limits of my model in the IC 20%, then 40%, then 60%, and finally 90%. I would first like to smooth each curve, and then I would like to color the surface between two curves of the same IC, for example that the surface between "data$inf90" and "data$sup90" is yellow, the area between "data$inf60" and "data$60" is orange, etc. And I would like to superimpose each of these colored surfaces + put the good legend please.

Thanks for your help !

thomas leon
  • 153
  • 11

2 Answers2

9

Cool question since I had to give myself a crash course in using LOESS for ribbons!

First thing I'm doing is getting the data into a long shape, since that's what ggplot will expect, and since your data has some characteristics that are kind of hidden within values. For example, if you gather into a long shape and have, say a column key, with a value of "inf20" and another of "sup20", those hold more information than you currently have access to, i.e. the measure type is either "inf" or "sup", and the level is 20. You can extract that information out of that column to get columns of measure types ("inf" or "sup") and levels (20, 40, 60, or 90), then map aesthetics onto those variables.

So here I'm getting the data into a long shape, then using spread to make columns of inf and sup, because those will become ymin and ymax for the ribbons. I made level a factor and reversed its levels, because I wanted to change the order of the ribbons being drawn such that the narrow one would come up last and be drawn on top.

library(tidyverse)

data_long <- data %>%
  as_tibble() %>%
  gather(key = key, value = value, -Nb_obs, -Nb_obst) %>%
  mutate(measure = str_extract(key, "\\D+")) %>%
  mutate(level = str_extract(key, "\\d+")) %>%
  select(-key) %>%
  group_by(level, measure) %>%
  mutate(row = row_number()) %>%
  spread(key = measure, value = value) %>%
  ungroup() %>%
  mutate(level = as.factor(level) %>% fct_rev())

head(data_long)
#> # A tibble: 6 x 6
#>   Nb_obs Nb_obst level   row   inf   sup
#>    <dbl>   <dbl> <fct> <int> <dbl> <dbl>
#> 1      0      35 20        2     2     4
#> 2      0      35 40        2     2     5
#> 3      0      35 60        2     1     6
#> 4      0      35 90        2     0    11
#> 5      0      39 20        8     3     5
#> 6      0      39 40        8     2     6

ggplot(data_long, aes(x = Nb_obst, ymin = inf, ymax = sup, fill = level)) +
  geom_ribbon(alpha = 0.6) +
  scale_fill_manual(values = c("20" = "darkred", "40" = "red", 
      "60" = "darkorange", "90" = "yellow")) +
  theme_light()

But it still has the issue of being jagged, so for each level I predicted smoothed values of both inf and sup versus Nb_obst using loess. group_by and do yield a nested data frame, and unnest pulls it back out into a workable form. Feel free to adjust the span parameter, as well as other loess.control parameters that I know very little about.

data_smooth <- data_long %>%
  group_by(level) %>%
  do(Nb_obst = .$Nb_obst,
     inf_smooth = predict(loess(.$inf ~ .$Nb_obst, span = 0.35), .$Nb_obst), 
     sup_smooth = predict(loess(.$sup ~ .$Nb_obst, span = 0.35), .$Nb_obst)) %>%
  unnest() 

head(data_smooth)
#> # A tibble: 6 x 4
#>   level Nb_obst inf_smooth sup_smooth
#>   <fct>   <dbl>      <dbl>      <dbl>
#> 1 90         35      0           11. 
#> 2 90         39      0           13.4
#> 3 90         48      0.526       16.7
#> 4 90         39      0           13.4
#> 5 90         41      0           13  
#> 6 90         41      0           13

ggplot(data_smooth, aes(x = Nb_obst, ymin = inf_smooth, ymax = sup_smooth, fill = level)) +
  geom_ribbon(alpha = 0.6) +
  scale_fill_manual(values = c("20" = "darkred", "40" = "red", 
      "60" = "darkorange", "90" = "yellow")) +
  theme_light()

Created on 2018-05-26 by the reprex package (v0.2.0).

camille
  • 16,432
  • 18
  • 38
  • 60
  • ooooo, it looks like what I want ! Thanks. Now I have to understand the subtleties of this code :) – thomas leon May 26 '18 at 17:54
  • Cool, let me know if you need me to break the steps down any further. Generally when I'm trying to make sense of a lot of tidying code, I'll just run bits of it, adding back in one piped statement at a time, until I can trace everything that's going on – camille May 26 '18 at 17:58
  • OK, thanks ! This is the graph of my zone 1 in case 1. I have to do 5 more. I have to check how it all looks. – thomas leon May 26 '18 at 18:08
  • So if you have multiple similar plots to make, you'll probably want to facet them, and have additional variables going into the `group_by` for the loess – camille May 26 '18 at 18:12
  • Yes, I have 1 dataset by 1 case. I'm not good with ggplot, I thought I was using `par(mfrow = c(3,2))` – thomas leon May 26 '18 at 18:20
  • `par` functions are just for base R, but the equivalent in ggplot would be setting `facet_wrap` or `facet_grid`, depending on how many variables you need to split on – camille May 26 '18 at 18:40
3

This produces the plot with shaded areas using base R graphics.
The trick is to pair the x values with the y values.

plot(data$Nb_obst, data$Nb_obs, type = "n",  xlab = "Number obst", ylab = "number obs", ylim = c(0, 25))

lines(data$Nb_obst, data$inf20, col = "dark red")
lines(data$Nb_obst, data$sup20, col = "dark red")

lines(data$Nb_obst, data$inf40, col = "red")
lines(data$Nb_obst, data$sup40, col = "red")

lines(data$Nb_obst, data$inf60, col = "dark orange")
lines(data$Nb_obst, data$sup60, col = "dark orange")

lines(data$Nb_obst, data$inf90, col = "yellow")
lines(data$Nb_obst, data$sup90, col = "yellow")

with(data, polygon(c(Nb_obst, rev(Nb_obst)), c(inf90, rev(sup90)), col = "yellow"))
with(data, polygon(c(Nb_obst, rev(Nb_obst)), c(inf60, rev(sup60)), col = "dark orange"))
with(data, polygon(c(Nb_obst, rev(Nb_obst)), c(inf40, rev(sup40)), col = "red"))
with(data, polygon(c(Nb_obst, rev(Nb_obst)), c(inf20, rev(sup20)), col = "dark red"))

enter image description here

The code for a ggplot graph is a bit longer. There is a function geom_ribbon perfect for this.

g <- ggplot(data)
g + geom_ribbon(aes(x = Nb_obst, ymin = sup60, ymax = sup90), fill = "yellow") + 
    geom_ribbon(aes(x = Nb_obst, ymin = sup40, ymax = sup60), fill = "dark orange") + 
    geom_ribbon(aes(x = Nb_obst, ymin = sup20, ymax = sup40), fill = "red") + 
    geom_ribbon(aes(x = Nb_obst, ymin = inf20, ymax = sup20), fill = "dark red") + 
    geom_ribbon(aes(x = Nb_obst, ymin = inf40, ymax = inf20), fill = "red") + 
    geom_ribbon(aes(x = Nb_obst, ymin = inf60, ymax = inf40), fill = "dark orange") + 
    geom_ribbon(aes(x = Nb_obst, ymin = inf90, ymax = inf60), fill = "yellow")

enter image description here

Data.

I will redo your dataset, simplifying its creation. You don't need as.vector and if you are creating a data.frame there is no need for the data.frame method of cbind, data.frame(.) is enough.

Nb_obs <- c( 2,  0,  6,  2,  7,  1,  8,  0,  2,  1,  1,  3, 11,  5,  9,  6,  4,  0,  7,  9)
Nb_obst <- c(31, 35, 35, 35, 39, 39, 39, 39, 39, 41, 41, 42, 43, 43, 45, 45, 47, 48, 51, 51)
inf20 <- c(2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 4, 4, 3, 5, 4)
sup20 <- c(3, 4, 4, 4, 5, 4, 4, 5, 4, 4, 5, 5, 5, 6, 5, 6, 6, 5, 7, 6)
inf40 <- c(1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, 3, 3, 4, 3)
sup40 <- c(4, 5, 5, 5, 6, 5, 5, 6, 5, 5, 6, 6, 6, 7, 6, 7, 7, 7, 9, 7)
inf60 <- c(1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 2)
sup60 <- c(5, 6, 6,  6,  8,  7,  7,  7,  7,  7,  7,  7,  8,  9,  8,  9,  9,  9, 11,  9)
inf90 <- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1)
sup90 <- c(10, 11, 11, 11, 15, 13, 13, 14, 12, 13, 13, 13, 14, 17, 15, 17, 17, 16, 21, 18)

data <- data.frame(Nb_obs, Nb_obst, inf20, sup20, inf40, sup40, inf60 , sup60, inf90 , sup90)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thanks a lot ! Can we "smooth" each lines with plot() and lines() or we need use other function like ggplot function please ? – thomas leon May 26 '18 at 17:24
  • @thomasleon To smooth the lines you will have to smooth the data, using, for instance, the output of `loess(Nb_obst, sup90)`, etc, or of another smoothing function. – Rui Barradas May 26 '18 at 17:31