Sum Groups of Observations in R Data Frame (Group by similar values)

Question

The data frame consists of the variables Date, Type and Total, where Type is either Buy or Sell.

How can we group the observations such that only neighboring observations of the same Type are grouped together, then sum the Total of all observations in each group. In other words, we keep adding the next observation to the current group until the value of Type changes.

For example, in the data frame below, the groups are as follows

Obs 1 & 2
Obs 3 & 4
Obs 5 & 6
Obs 7, 8 & 9

Reproducible data, thank you @bgoldst:

df1 <- data.frame(Date=rep(as.POSIXct('2016-06-16 06:27:39'),9L),
                  Type=c('Buy','Buy','Sell','Sell','Buy','Buy','Sell','Sell','Sell'),
                  Total=c(1.548012e+01,1.051480e+02,5.956740e+00,3.872415e+01,1.333391e+02,1.941060e-01,1.941060e-01,1.941060e-01,3.277059e-01))

Please try to avoid pictures of data. A reproducible example is much more helpful. — Sotos, Jun 16 '16 at 07:08

bgoldst · Answer 1 · 2016-06-16T07:28:56.483

Here's a slightly ugly base R solution built around aggregate(). It uses an inequality comparison between consecutive elements of Type and cumsum() to synthesize a transient grouping column to distinguish non-sequential instances of Type.

df <- data.frame(Date=rep(as.POSIXct('2016-06-16 06:27:39'),9L),Type=c('Buy','Buy','Sell','Sell','Buy','Buy','Sell','Sell','Sell'),Total=c(1.548012e+01,1.051480e+02,5.956740e+00,3.872415e+01,1.333391e+02,1.941060e-01,1.941060e-01,1.941060e-01,3.277059e-01));
aggregate(Total~Date+Type+TypeSeq,transform(df,TypeSeq=c(0L,cumsum(Type[-1L]!=Type[-nrow(df)]))),sum)[-3L];
##                  Date Type       Total
## 1 2016-06-16 06:27:39  Buy 120.6281200
## 2 2016-06-16 06:27:39 Sell  44.6808900
## 3 2016-06-16 06:27:39  Buy 133.5332060
## 4 2016-06-16 06:27:39 Sell   0.7159179

Same idea implemented with data.table:

library(data.table);
dt <- as.data.table(df);
dt[,.(Total=sum(Total)),.(Date,Type,TypeSeq=c(0L,cumsum(Type[-1L]!=Type[-nrow(dt)])))][,-3L,with=F];
##                   Date Type       Total
## 1: 2016-06-16 06:27:39  Buy 120.6281200
## 2: 2016-06-16 06:27:39 Sell  44.6808900
## 3: 2016-06-16 06:27:39  Buy 133.5332060
## 4: 2016-06-16 06:27:39 Sell   0.7159179

I had a similar post yesterday [regarding grouping part](http://stackoverflow.com/questions/37809094/create-group-names-for-consecutive-values), aggregate part is answered many times already. — zx8754, Jun 16 '16 at 07:34

score 0 · Answer 2 · edited Jun 16 '16 at 13:00

Simple solution using data.table (latest stable, v1.9.6 on CRAN):

require(data.table)
# Create group id *and* aggregate in one-go using expressions in 'by'
setDT(df)[, .(total = sum(Total)), by=.(group=rleid(Type), Date)]

#    group                Date       total
# 1:     1 2016-06-16 06:27:39 120.6281200
# 2:     2 2016-06-16 06:27:39  44.6808900
# 3:     3 2016-06-16 06:27:39 133.5332060
# 4:     4 2016-06-16 06:27:39   0.7159179

Sum Groups of Observations in R Data Frame (Group by similar values)

2 Answers2