1

I have a task that I'd like to accomplish in dplyr but haven't been able to sort how to do so.

I have a dataframe with years, a factor, and a value. I want to create a new column (mutate) that sums all the values within a year (group_by) and takes the value and divides by the year sum. Below shows what I want to accomplish and I have the first three columns in my df.

year  factor    value    share
1977     a      564907   value / sum(value for year 1977)
1977     l     2852949   value / sum(value for year 1977)
1978     a      504028   value / sum(value for year 1978)
1978     1      413120   value / sum(value for year 1978)
1978     y     2553088   value / sum(value for year 1978)
1979     a      497766   value / sum(value for year 1979)
1979     c      789007   value / sum(value for year 1979)

As expected,

group_by(year) %>% summarize(year.total = sum(value)) 

drops the value column so I can't continue with creating the share column.

I think I need a conditional mutate, something like %>% mutate(share = value / (sum value for all years that matches current row year)). And yes, the number of rows per year is variable.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
zazizoma
  • 437
  • 1
  • 7
  • 18

0 Answers0