0

Suppose you have a large df and you want a simple and fast way to get df1 from df (a large R dataframe):

df:

index  var1  var2  var3  var4
  0      2     4     8     7
  1      2     3     9     6
  2      1     5    10     8
  2      1     5     7     8
  2      2     9    33    10
  2      2     9    17    10
  3      3     6     6     9
  3      4     7    11    10

and the idea is to aggregate (by sum) the rows with the same index and the same var1 without changing the others. Notice that var4 is the same per index var2 combination.

df1: 

index  var1  var2  var3  var4
  0      2     4     8     7
  1      2     3     9     6
  2      1     5    17     8
  2      2     9    50    10
  3      3     6     6     9
  3      4     7    11    10
StivJ
  • 55
  • 5
  • You only want to sum up column `var3`? What should happen with the other columns if they are not unique over aggregation or are they unique per group? – GKi Dec 02 '19 at 15:08
  • [this question](https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group) is about how to sum 1 column, grouped by 1 other column. Your question is about how to sum m columns, grouped by n other columns (m and n > 1), but most of the answers there can be used for your situation (m and n > 1) also. – IceCreamToucan Dec 02 '19 at 15:18
  • Yes @GKi , the other columns are unique. per var1 group. Also, this is how IceCreamToucan says, grouped by n other columns (m and n>1) – StivJ Dec 02 '19 at 15:30
  • OK. Then you can use them also for grouping to keep them in the output. – GKi Dec 02 '19 at 15:38

2 Answers2

2

Maybe you can use aggregate() as below

v <- aggregate(df[-(1:2)], df[1:2], function(x) sum(unique(x)))
res <- v[order(v$index),]

and thus

> res
  index var1 var2 var3 var4
2     0    2    4    8    7
3     1    2    3    9    6
1     2    1    5   17    8
4     2    2    9   50   10
5     3    3    6    6    9
6     3    4    7   11   10
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • In case `var3` holds the same value more than one time inside a group it might come to an unexpected `sum`. – GKi Dec 02 '19 at 15:59
  • @GKi well.....I guess OP just want to keep the duplicates instead of adding them up, so I used `sum(unique(x))`, but not sure the rule for output – ThomasIsCoding Dec 02 '19 at 16:14
  • Thanks again @ThomasIsCoding , I was in trouble getting the rest of the columns. – StivJ Dec 02 '19 at 20:30
0

You can use rowsum to aggregate by sum.

rowsum(x[4], interaction(x[-4]))
#1.2.3.6     9
#0.2.4.7     8
#2.1.5.8    17
#3.3.6.9     6
#3.4.7.10   11
#2.2.9.10   50

or use aggregate:

aggregate(var3 ~ ., x, sum)
#  index var1 var2 var4 var3
#1     1    2    3    6    9
#2     0    2    4    7    8
#3     2    1    5    8   17
#4     3    3    6    9    6
#5     3    4    7   10   11
#6     2    2    9   10   50

Data:

x <- structure(list(index = c(0L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), var1 = c(2L, 
2L, 1L, 1L, 2L, 2L, 3L, 4L), var2 = c(4L, 3L, 5L, 5L, 9L, 9L, 
6L, 7L), var3 = c(8L, 9L, 10L, 7L, 33L, 17L, 6L, 11L), var4 = c(7L, 
6L, 8L, 8L, 10L, 10L, 9L, 10L)), class = "data.frame", row.names = c(NA, 
-8L))
GKi
  • 37,245
  • 2
  • 26
  • 48