Sum rows depending on two conditions and keep conditions

Question

I've been trying for a good amount of time to find a solution to my problem on Stack Overflow and other R-related websites. My Data Frame (diva2) looks like this:

structure(list(cyear = c(1933, 1933, 1934, 1934, 1935, 1935, 1933, 1933, 1934, 1934),
relativeyear = c(-7, -7, -7, -7, -7, -7, -6, -6, -6, -6), 
dollarandiv = c(416, 358, 304, 214, 158, 507, 236, 417, 242, 248), 
.Names = c("cyear", "relativeyear", "dollarandiv"), 
row.names = c(NA, 10L), class = "data.frame")

The real data set is around 19600 obs, where $cyear$ ranges from 1933 to 2000 and $relativeyear$ from -1 to 51.
I want to be able to find the sum of $dollarandiv$ where $cyear$ = 1933 and $relativeyear$ = -7. The same should then apply for every other combination, so the sum when $cyear$ = 1934 and $relativeyear$ = -6.
In my data set, it is nicely structured, so the sum should happen at every 5th row, but I would like to keep the relevant $cyear$ and $relativeyear$ for the summarized result. Thus far, I've tried to use dplyr by doing the following:

work.total <- diva3 %>%
  transmute(relativeyear = relativeyear,
            cyear = cyear,
            dollarandiv = dollarandiv) %>%
  group_by(cyear) %>%
  summarize(tdollarandiv = sum(dollarandiv))

This yields a wrong output, because at one point $relativeyear$ changes to -6, but R sums across all $relativeyears$ even though I would like to keep them separated (i.e. -7, -6, -5,...,51). The ideal results for the structure above should look like

structure(list(cyear = c(1933, 1934, 1935, 1933, 1934),
relativeyear = c(-7, -7, -7, -6, -6), 
dollarandiv = c(774, 518, 665, 653, 490), 
.Names = c("cyear", "relativeyear", "dollarandiv"), 
row.names = c(NA, 5L), class = "data.frame")

I feel as though I've looked through every previous SO-post which should be relevant to this, but I can't seem to adapt it to my problem here. In my mind the solution should be simple, but for some reason I cant figure it out. I hope someone here can help me.

score 0 · Accepted Answer · answered May 11 '21 at 06:07

0

try this

diva3 %>%
  group_by(cyear, relativeyear) %>%
  summarize(tdollarandiv = sum(dollarandiv), .groups = 'drop')

answered May 11 '21 at 06:07

AnilGoyal

25,297
4
27
45

Sum rows depending on two conditions and keep conditions

1 Answers1