0

I have a time series of policies that were adopted over the past few decades, and want to make a stacked area plot with cumulative policy counts, as they remain in force after adoption. I would like them to be grouped by organization, with time on the x and cumulative count on the y to show growth in policy adoption over time.

Data:

df<- data.frame(
  organization = c("a", "a", "c", "c", "a", "b"),
  year = c(1990, 1991, 1992, 1993, 1994, 1995),
  count= c(1,1,1,0,1,1))

I have tried the following:

df%>%
group_by(organization, year) %>%
summarise(total = sum(count)) %>%
ggplot(  aes(x=year, y= cumsum( total),fill=factor(organization))) +
geom_area(position = "stack")

Right now I get a plot like this that is not cumulative -- I think it is because for some years there is no policy adopted.

enter image description here

I am interested in getting something that looks like this:

enter image description here

Image source: https://www.r-graph-gallery.com/136-stacked-area-chart.html

I would really appreciate any help!!!

Eric Leung
  • 2,354
  • 12
  • 25
Alyssa C
  • 79
  • 8

3 Answers3

2

For each organization, you'll want to make sure you have at least one value for counts for the minimum and maximum years. This is so that ggplot2 will fill in the gaps. Also, you'll want to be careful with cumulating sums. So the solution I've shown below adds in a zero count if not value exists for the earliest and last year.

I've added some code so that you can automate the adding of rows for organizations that don't have data for the first and last all years of your data. To incorporate this automated code, you'll want to merge in the tail_dat complete_dat data frame and change the variables dat within the data.frame() definition to suite your own data.

library(ggplot2)
library(dplyr)
library(tidyr)

# Create sample data
dat <- tribble(
  ~organization, ~year, ~count,
  "a", 1990, 1,
  "a", 1991, 1,
  "b", 1991, 1,
  "c", 1992, 1,
  "c", 1993, 0,
  "a", 1994, 1,
  "b", 1995, 1
)
dat
#> # A tibble: 7 x 3
#>   organization  year count
#>   <chr>        <dbl> <dbl>
#> 1 a             1990     1
#> 2 a             1991     1
#> 3 b             1991     1
#> 4 c             1992     1
#> 5 c             1993     0
#> 6 a             1994     1
#> 7 b             1995     1

# NOTE incorrect results for comparison
dat %>%
  group_by(organization, year) %>%
  summarise(total = sum(count)) %>%
  ggplot(aes(x = year, y = cumsum(total), fill = organization)) +
  geom_area()
#> `summarise()` regrouping output by 'organization' (override with `.groups` argument)


# Fill out all years and organization combinations
complete_dat <- tidyr::expand(dat, organization, year = 1990:1995)
complete_dat
#> # A tibble: 18 x 2
#>    organization  year
#>    <chr>        <int>
#>  1 a             1990
#>  2 a             1991
#>  3 a             1992
#>  4 a             1993
#>  5 a             1994
#>  6 a             1995
#>  7 b             1990
#>  8 b             1991
#>  9 b             1992
#> 10 b             1993
#> 11 b             1994
#> 12 b             1995
#> 13 c             1990
#> 14 c             1991
#> 15 c             1992
#> 16 c             1993
#> 17 c             1994
#> 18 c             1995

# Update data so that counting works and fills in gaps
final_dat <- complete_dat %>%
  left_join(dat, by = c("organization", "year")) %>%
  replace_na(list(count = 0)) %>%  # Replace NA with zeros
  group_by(organization, year) %>%
  arrange(organization, year) %>%  # Arrange by year so adding works
  group_by(organization) %>%
  mutate(aggcount = cumsum(count))
final_dat
#> # A tibble: 18 x 4
#> # Groups:   organization [3]
#>    organization  year count aggcount
#>    <chr>        <dbl> <dbl>    <dbl>
#>  1 a             1990     1        1
#>  2 a             1991     1        2
#>  3 a             1992     0        2
#>  4 a             1993     0        2
#>  5 a             1994     1        3
#>  6 a             1995     0        3
#>  7 b             1990     0        0
#>  8 b             1991     1        1
#>  9 b             1992     0        1
#> 10 b             1993     0        1
#> 11 b             1994     0        1
#> 12 b             1995     1        2
#> 13 c             1990     0        0
#> 14 c             1991     0        0
#> 15 c             1992     1        1
#> 16 c             1993     0        1
#> 17 c             1994     0        1
#> 18 c             1995     0        1

# Plot results
final_dat %>%
  ggplot(aes(x = year, y = aggcount, fill = organization)) +
  geom_area()

Created on 2020-12-10 by the reprex package (v0.3.0)

Eric Leung
  • 2,354
  • 12
  • 25
  • This works excellent ! I can get through the first part, but when I go to plot I get the notification: 'Error: Aesthetics can not vary with a ribbon' Any idea what that might be in reference to? – Alyssa C Dec 08 '20 at 17:23
  • 1
    There might be a problem with what you passed into `fill =`. This is the aesthetic that varies. So it should be some categorical data, not numeric. I'm unsure if it'll help much, but this issue has been asked elsewhere that might help to start digging into if this problem persists https://stackoverflow.com/questions/57333161 – Eric Leung Dec 08 '20 at 17:30
  • 1
    And if this answer works to solve your problem, feel free to accept this answer :) – Eric Leung Dec 08 '20 at 17:31
  • Actually, there is still some problem. I changed the fill to factor but I am still getting a chopped-up figure with huge gaps between the areas. Any ideas why that's happening? Sorry, this is a real trouble spot for me! – Alyssa C Dec 10 '20 at 19:36
  • 1
    I suspect your summing of counts is not actually cumulative based on your original post. This will cause Please take a look at the code I've shared and note how I've calculated the cumulative sum using `arrange(organization, year) %>% group_by(organization) %>% mutate(aggcount = cumsum(count))`. And make sure you have a data point for the first and last years in your data. Without them, the areas cannot connect and show the actual cumulative sum over time. I hope that helps. – Eric Leung Dec 10 '20 at 21:12
  • Yes, so I do have both first and last years in the data. It looks like there is something actually happening in years where there is a big jump in data ...and/or it's not cumulative? See image here: https://ibb.co/8m0GzWc I'm sorry, do you have any clue why that might be? I've tried geom_ribbon, stat_bin, etc etc. – Alyssa C Dec 10 '20 at 21:44
  • 1
    Thanks for the image. With that and looking at the plot again, it looks like the plot isn't entirely accurate. For example, some of the years for my Example A organization are not accurate and loses some counts. I've updated the code to add in all years for organizations. That should fill in those gaps you're seeing. And make sure you're calculative the cumulative sum correctly. The image you shared doesn't appear to be adding correctly. – Eric Leung Dec 10 '20 at 23:10
  • Ah-so it seems to be adding fine in the final_dat dataframe, however the plot is still chopped up still: https://ibb.co/8m0GzWc The odd thing is that I can ALMOST get the plot I want using geom_ribbon, but I can't get the ribbons to stack atop one another: https://ibb.co/pX5YVrj. I apologize, this is very mundane and driving me nuts! for such a simple plot! – Alyssa C Dec 11 '20 at 04:58
  • Without seeing the data, it is difficult to troubleshoot. I feel your pain in making the right plot! It is strange that your plots aren't coming out right. One of your recent plots you shared appears to be the same one you're shared previously. Two things I would try at this point is make the plot wider and take a closer look at the data frame. A wider plot might fix those gaps with large changes. I would then advise to inspect your data frame very carefully and compare it to the data frames I've created and ones shown in the examples you initially shared. – Eric Leung Dec 11 '20 at 15:16
  • So bizarre--yes that second plot I shared was the same as before, I get the same plot when I put in the new code. It's bizarre, it seems to not be getting the right cumulative x value even with the aggcount input. This is what my final_dat looks like just before plotting: – Alyssa C Dec 11 '20 at 21:03
  • # A tibble: 140 x 4 # Groups: RFMO [5] RFMO YEAR count aggcount 1 CCSBT 1995 1 1 2 CCSBT 1996 0 1 3 CCSBT 1997 0 1 4 CCSBT 1998 0 1 5 CCSBT 1999 0 1 6 CCSBT 2000 0 1 7 CCSBT 2001 0 1 8 CCSBT 2002 0 1 9 CCSBT 2003 1 2 10 CCSBT 2004 1 3 # … with 130 more rows – Alyssa C Dec 11 '20 at 21:06
  • Wow that's terrible formatting, sorry. It's just weird that it all seems exactly the same, except in the out put of the plot. Sorry feel free to ignore me! just going to keep trying things... – Alyssa C Dec 11 '20 at 21:08
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/225846/discussion-between-eric-leung-and-alyssa-c). – Eric Leung Dec 11 '20 at 22:37
1

Each organization needs to be represented for each year, even if the organization has 0. Then add mutate to your code and graph the cumulative total.

Remade data example for each organization to have a number for each year, some having 0

df = data.frame(time = rep(c(1990,1991,1992),3),
org = c("a","a","a","b","b","b","c","c","c"),
num = c(1,0,1,0,0,1,1,1,1))


  df%>%
  group_by(org, time) %>%
  summarise(total = sum(num)) %>%
  mutate(newtot = cumsum(total))%>%
  ggplot(aes(x= time, y= newtot,fill=org)) +
  geom_area()
Jonni
  • 804
  • 5
  • 16
0

Adding on to @Eric Leung's answer:

You can use tidyr::complete() to simplify the code a bit (code run Nov 2022):

library(ggplot2)
library(dplyr)
library(tidyr)

# Create sample data
dat <- tribble(
  ~organization, ~year, ~count,
  "a", 1990, 1,
  "a", 1991, 1,
  "b", 1991, 1,
  "c", 1992, 1,
  "c", 1993, 0,
  "a", 1994, 1,
  "b", 1995, 1
)

# "Complete" the data, fill in count = 0
final_dat <- dat %>%
  complete(organization, year = 1990:1995, fill = list(count = 0)) %>%
  group_by(organization) %>%
  arrange(year) %>%  # Arrange by year so adding works
  mutate(aggcount = cumsum(count)) %>% 
  ungroup()

final_dat

# Plot results
final_dat %>%
  ggplot(aes(x = year, y = aggcount, fill = organization)) +
  geom_area()

cummulative geom_area plot

trangdata
  • 111
  • 1
  • 4