0

I have a similar problem to a previous question by another user How to sum a variable by group?, but I have more than two variables in my dataframe. It looks a little like this:

A   B    C      D        E 
1   m   1990    1989    200 
1   m   1990    1990    100
1   m   1991    1989    10 
2   m   1991    1990    20 
2   m   1991    1991    100
3   m   1992    1989    30 
3   m   1992    1990    20 
3   m   1992    1991    10
4   m   1992    1992    10 
4   m   1993    1989    50

I want to lose the variable D and sum up E for every same value in A, B and C, without losing the other variables. I tried the advice given in the link above (aggregate, by, etc) but I ended up with only two variables. I want something like this:

A    B   C      E
1   m   1990    300
1   m   1991    10
2   m   1991    120
3   m   1992    30
3   m   1992    30
4   m   1992    10
4   m   1993    50

Thank you in advance!

(This is my first question, so please let me know if it's inappropriate / missing something.)

Community
  • 1
  • 1
anna
  • 1

2 Answers2

0

Check out the dplyr package. The solution would be somthing like :

library(dplyr)
data <- your_data
data_summed<- data %>% group_by(A, B, C) %>% mutate(F = sum(E))

dplyr's filter() can then be used to select only the columns of interest for your final data.frame.

For variations, check out this cheatsheet; its great.

Dason
  • 60,663
  • 9
  • 131
  • 148
NWaters
  • 1,163
  • 1
  • 15
  • 27
0

I think aggregate(E ~ A + B + C, data=df, FUN=sum) should do the trick. This splits the data on columns A, B and C and computes the sum of E.

> aggregate(e ~ a+b+c, data=df, FUN=sum)

  a b    c   e
1 1 m 1990 300
2 1 m 1991  10
3 2 m 1991 120
4 3 m 1992  60
5 4 m 1992  10
6 4 m 1993  50
mattdevlin
  • 1,045
  • 2
  • 10
  • 17