0

Here's a sample of the relevant part of my data:

Name Year.published Name.meaning
a 2021 Location
b 2021 Location
c 2019 Location
d 2015 Location
e 2015 Person
f 2015 Conceptual
g 2010 Location

I have used the following code to create the following graph, and it seems to have plotted n per year, rather than cumulatively plotting n per year where it adds up over time.

I tried to add cumsum() to the y variable to get a cumulative count of n, but this seems to produce a cumulative sum across all the lines (second image), making each line go up to around 70 - which is way too high for each group total.

library(ggplot2)
library(dplyr)
df = paperstats %>% dplyr::count(Year.published, Name.meaning)
ggplot(df,
       aes(x = Year.published,
           y = n,
           color = Name.meaning)) +  geom_line()

enter image description here

enter image description here

Is there any way to apply cumsum() per line, or an alternative method of counting items per group over time in a line plot?

harry1027
  • 51
  • 4
  • 1
    Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including a small representative dataset in a plain text format - for example the output from `dput(paperstats)`, if that is not too large. – neilfws Aug 22 '22 at 01:47

1 Answers1

1

I think you need to group on Name.meaning, arrange by Name.meaning and Year.published, then calculate the cumsum before plotting.

It's difficult to test without seeing your data, but something like this:

paperstats %>%
  count(Year.published, Name.meaning) %>% 
  group_by(Name.meaning) %>% 
  arrange(Name.meaning, Year.published) %>% 
  mutate(cs = cumsum(n)) %>% 
  ungroup() %>% 
  ggplot(aes(Year.published, cs)) + 
  geom_line(aes(color = Name.meaning))
neilfws
  • 32,751
  • 5
  • 50
  • 63