0

Probably trivial but tricky for me to get it right.

Given the start and end date in A, as well as duration in months between the date range:

A=
structure(list(start..yyyy.mm. = c(197901L, 197901L, 197901L, 
    197901L, 197901L), X.yyyy.mm. = c(197901L, 197904L, 197908L, 
    197902L, 197902L), duration = c(1L, 4L, 8L, 2L, 2L), area..km.2. = structure(c(1L, 
    2L, 4L, 3L, 5L), .Label = c("46952.85", "c(125267.7, 72379.43, 72468.91, 13200.26)", 
    "c(19814.74, 39570.96)", "c(26513.05, 26513.05, 26513.05, 26513.05, 26513.05, 19898.57, 26513.05, 26513.05)", 
    "c(52291.77, 52291.77)"), class = "factor")), .Names = c("start..yyyy.mm.", 
    "X.yyyy.mm.", "duration", "area..km.2."), class = "data.frame", row.names = c(NA, 
    -5L))

I would like to produce something similar to the plot shown below (ignore the histogram). Each duration is colored differently. In A, the first area value corresponds to the first month in date range etc..

The dates in A are not continuous as you can see. Therefore, the intention is to create a continuous date axis such as ts <- seq(as.Date("1910-01-01"), as.Date("2015-12-31"), by="month")and shade areas with respect to start and enddates for a given duration.

Date ranges where no values where recorded should have NA.

How can I implement this is R using any package?

First idea in that came to mind was to create a continuous date as:

library(dplyr)
data_with_missing_times <- full_join(ts,A)

then do the plotting? a similar question is here but here I intend to shade date ranges. My data goes from 1910 - 2015 with missing date ranges at some intervals.

Thank you.

sample plot to reproduce

code123
  • 2,082
  • 4
  • 30
  • 53
  • Are the months in `A` in yearmonth format? EDIT: sorry yes I see the headings – Calum You Apr 24 '18 at 18:42
  • Can you explain what's up with the `area..km.2` column? I'm confused by there being levels like "c(125267.7, 72379.43, 72468.91, 13200.26)" as one string. Did you want to make this a nested list column or something? – camille Apr 24 '18 at 19:45
  • @camille: I guess you need to use `separate_rows` to break it down for each month – Tung Apr 24 '18 at 19:50
  • @camille In `A`, the first `area` value corresponds to the first month in date range etc.. For example, in row 2 the area for `197901 is 125267.7, 197902 is 72379.43` etc... – code123 Apr 24 '18 at 20:18

1 Answers1

1

I am not sure exactly what you wanted to plot, but here is something that does the trick. It's weird that you have the areas in factor form rather than as a list-column, since that forces separate_rows and filter rather than a simple unnest. The main thing here is adding an extra row to each group so that the duration 1 has two date values, and then adding the right dates based on those groupings. That allows us to plot the overlapping dates using geom_ribbon or geom_area, whatever your pick.

EDIT: if you look through this approach what it does is avoid creating rows for every month in the timeseries, instead only creating observations where there are areas to plot. If you want to extend the limits of the x-axis you can simply call scale_x_date and change the limits, but it should automatically scale to where the data are. Also changed the input data so that none of it overlaps, and changed the ribbon plot to match.

library(tidyverse)
A <- structure(list(start..yyyy.mm. = c(197901L, 197901L, 197901L,197901L, 197901L), X.yyyy.mm. = c(197901L, 197904L, 197908L,197902L, 197902L), duration = c(1L, 4L, 8L, 2L, 2L), area..km.2. = structure(c(1L,2L, 4L, 3L, 5L), .Label = c("46952.85", "c(125267.7, 72379.43, 72468.91, 13200.26)","c(19814.74, 39570.96)", "c(26513.05, 26513.05, 26513.05, 26513.05, 26513.05, 19898.57, 26513.05, 26513.05)","c(52291.77, 52291.77)"), class = "factor")), .Names = c("start..yyyy.mm.","X.yyyy.mm.", "duration", "area..km.2."), class = "data.frame", row.names = c(NA,-5L))

tbl <- A %>%
  mutate(start = seq.Date(as.Date("1979-01-01"), by = "year", length.out = 5)) %>%
  select(start, duration, area = area..km.2.) %>%
  rowid_to_column() %>%
  separate_rows(area) %>%
  filter(!area %in% c("c", ""))

indices <- seq(nrow(tbl)) %>%
  split(group_indices(tbl, rowid)) %>%
  map(~ c(.x, NA)) %>%
  unlist()

tbl <- tbl[indices, ] %>%
  fill(rowid, start, duration, area) %>%
  group_by(rowid) %>%
  mutate(
    date = seq.Date(
      from = first(start),
      by = "month",
      length.out = first(duration) + 1
    ),
    area = as.numeric(area)
  ) %>%
  ungroup()

ggplot(tbl) +
  geom_ribbon(aes(x = date, fill = factor(rowid), ymax = 1, ymin = 0))

ggplot(tbl) +
  geom_area(
    mapping = aes(x = date, y = area, fill = factor(rowid)),
    alpha = 0.3,
    position = "identity"
    ) +
  scale_x_date(limits = c(as.Date("1979-01-01"), Sys.Date()))

Created on 2018-04-24 by the reprex package (v0.2.0).

Calum You
  • 14,687
  • 4
  • 23
  • 42
  • Thank you for these insights. I would like to plot the data above just like on the image I added. My actual data is a time series for the period `ts <- seq(as.Date("1910-01-01"), as.Date("2015-12-31"), by="month")`. So, if you can show the location `A` on `ts` just like any other time series plot, it will solve my problem. Other date intervals with `ts` without values should have `NA`. I guess the image above shows the time series better. Ignore the histogram. I can generate that for sure. – code123 Apr 24 '18 at 22:06
  • You said ignore the histogram so I don't know what exactly you want to plot. How do you want do deal with overlaps? Do you just want to extend the range of the x-axis? – Calum You Apr 24 '18 at 22:07
  • Oh! I see what you mean. Can you just remove the overlaps then illustrate how to go about plotting the time series as in the image? I will then deal with the issue of overlaps. Thank you. – code123 Apr 24 '18 at 23:24