1

The data frame consists of the variables Date, Type and Total, where Type is either Buy or Sell.

How can we group the observations such that only neighboring observations of the same Type are grouped together, then sum the Total of all observations in each group. In other words, we keep adding the next observation to the current group until the value of Type changes.

For example, in the data frame below, the groups are as follows

  • Obs 1 & 2
  • Obs 3 & 4
  • Obs 5 & 6
  • Obs 7, 8 & 9

enter image description here

Reproducible data, thank you @bgoldst:

df1 <- data.frame(Date=rep(as.POSIXct('2016-06-16 06:27:39'),9L),
                  Type=c('Buy','Buy','Sell','Sell','Buy','Buy','Sell','Sell','Sell'),
                  Total=c(1.548012e+01,1.051480e+02,5.956740e+00,3.872415e+01,1.333391e+02,1.941060e-01,1.941060e-01,1.941060e-01,3.277059e-01))
zx8754
  • 52,746
  • 12
  • 114
  • 209
Nyxynyx
  • 61,411
  • 155
  • 482
  • 830

2 Answers2

2

Here's a slightly ugly base R solution built around aggregate(). It uses an inequality comparison between consecutive elements of Type and cumsum() to synthesize a transient grouping column to distinguish non-sequential instances of Type.

df <- data.frame(Date=rep(as.POSIXct('2016-06-16 06:27:39'),9L),Type=c('Buy','Buy','Sell','Sell','Buy','Buy','Sell','Sell','Sell'),Total=c(1.548012e+01,1.051480e+02,5.956740e+00,3.872415e+01,1.333391e+02,1.941060e-01,1.941060e-01,1.941060e-01,3.277059e-01));
aggregate(Total~Date+Type+TypeSeq,transform(df,TypeSeq=c(0L,cumsum(Type[-1L]!=Type[-nrow(df)]))),sum)[-3L];
##                  Date Type       Total
## 1 2016-06-16 06:27:39  Buy 120.6281200
## 2 2016-06-16 06:27:39 Sell  44.6808900
## 3 2016-06-16 06:27:39  Buy 133.5332060
## 4 2016-06-16 06:27:39 Sell   0.7159179

Same idea implemented with data.table:

library(data.table);
dt <- as.data.table(df);
dt[,.(Total=sum(Total)),.(Date,Type,TypeSeq=c(0L,cumsum(Type[-1L]!=Type[-nrow(dt)])))][,-3L,with=F];
##                   Date Type       Total
## 1: 2016-06-16 06:27:39  Buy 120.6281200
## 2: 2016-06-16 06:27:39 Sell  44.6808900
## 3: 2016-06-16 06:27:39  Buy 133.5332060
## 4: 2016-06-16 06:27:39 Sell   0.7159179
bgoldst
  • 34,190
  • 6
  • 38
  • 64
  • I had a similar post yesterday [regarding grouping part](http://stackoverflow.com/questions/37809094/create-group-names-for-consecutive-values), aggregate part is answered many times already. – zx8754 Jun 16 '16 at 07:34
0

Simple solution using data.table (latest stable, v1.9.6 on CRAN):

require(data.table)
# Create group id *and* aggregate in one-go using expressions in 'by'
setDT(df)[, .(total = sum(Total)), by=.(group=rleid(Type), Date)]

#    group                Date       total
# 1:     1 2016-06-16 06:27:39 120.6281200
# 2:     2 2016-06-16 06:27:39  44.6808900
# 3:     3 2016-06-16 06:27:39 133.5332060
# 4:     4 2016-06-16 06:27:39   0.7159179
Arun
  • 116,683
  • 26
  • 284
  • 387
rafa.pereira
  • 13,251
  • 6
  • 71
  • 109