0

I have a dataframe like this:

   index value  timestamp
1     A     2  03/01/2019
2     B     3  03/01/2019
3     C     5  03/01/2019
4     D     8  03/01/2019
5     A     2  03/02/2019
6     B    12  03/02/2019
7     C     4  03/02/2019
8     D     3  03/02/2019
9     A     2  03/03/2019
10    B     1  03/03/2019
11    C     1  03/03/2019
12    D     2  03/03/2019

I would like to add new rows "B+C" and sum the "value" filed within each timestamp. Here is the intended output:

  index value  timestamp
1     A     2  03/01/2019
2   B+C     8  03/01/2019
3     D     8  03/01/2019
4     A     2  03/02/2019
5   B+C    16  03/02/2019
6     D     3  03/02/2019
7     A     2  03/03/2019
8   B+C     2  03/03/2019
9     D     2  03/03/2019
Qzhao
  • 57
  • 3

1 Answers1

1

You can change 'B' and 'C' values to 'B + C'.

df$index[df$index %in% c('B', 'C')] <- 'B + C'

then it becomes group by sum problem which can be done in base R :

aggregate(value~timestamp + index, df, sum)

#   timestamp index value
#1 03/01/2019     A     2
#2 03/02/2019     A     2
#3 03/03/2019     A     2
#4 03/01/2019 B + C     8
#5 03/02/2019 B + C    16
#6 03/03/2019 B + C     2
#7 03/01/2019     D     8
#8 03/02/2019     D     3
#9 03/03/2019     D     2

dplyr :

library(dplyr)
df %>% group_by(timestamp, index) %>% summarise(value = sum(value))

Or data.table :

library(data.table)
setDT(df)[, .(value = sum(value)), .(timestamp,index)]

data

df <- structure(list(index = c("A", "B", "C", "D", "A", "B", "C", "D", 
"A", "B", "C", "D"), value = c(2L, 3L, 5L, 8L, 2L, 12L, 4L, 3L, 
2L, 1L, 1L, 2L), timestamp = c("03/01/2019", "03/01/2019", "03/01/2019", 
"03/01/2019", "03/02/2019", "03/02/2019", "03/02/2019", "03/02/2019", 
"03/03/2019", "03/03/2019", "03/03/2019", "03/03/2019")), 
class = "data.frame", row.names = c(NA, -12L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213