3

There are 3 parts to this problem:

1) I want to sum values in column b,c,d for any two adjacent rows which have the same values for columns(b,c,d)

2) I would like to keep values in other columns the same. (Some other column (eg. a) may contain character data.)

3) I would like to keep the changes by replacing the original value in columns b,c,d in the first row (of the 2 same rows) with the new values (the sums) and delete the second row(of the 2 same rows).

   Time         a    b     c     d    id 
1  2014/10/11   A   40    20    10     1
2  2014/10/12   A   40    20    10     2
3  2014/10/13   B    9    10     9     3
4  2014/10/14   D   16     5    12     4
5  2014/10/15   D    1     6     5     5
6  2014/10/16   B   20     7     8     6
7  2014/10/17   B   20     7     8     7
8  2014/10/18   A   11     9     5     8
9  2014/10/19   C   31    20    23     9

Expected outcome:

    Time         a    b     c     d    id 
 1  2014/10/11   A   80    40    20     1 *
 3  2014/10/13   B    9    10     9     3
 4  2014/10/14   D   16     5    12     4
 5  2014/10/15   D    1     6     5     5
 6  2014/10/16   B   40    14    16     6 *
 8  2014/10/18   A   11     9     5     8
 9  2014/10/19   C   31    20    23     9

id 1 and 2 combined to become id 1; id 6 and 7 combined to become id 6.

Thank you. Any contribution is greatly appreciated.

nilsinelabore
  • 4,143
  • 17
  • 65
  • 122

1 Answers1

2

Using dplyr functions along with data.table::rleid. To get same values for adjacent b, c and d columns we paste them and use rleid to create groups. For each group we sum the values at b, c and d columns and keep only the 1st row.

library(dplyr)

df %>%
  mutate(temp_col = paste(b, c, d, sep = "-")) %>%
  group_by(group = data.table::rleid(temp_col)) %>%
  mutate_at(vars(b, c, d), sum) %>%
  slice(1L) %>%
  ungroup %>%
  select(-temp_col, -group)

#  Time       a         b     c     d    id
#  <fct>      <fct> <int> <int> <int> <int>
#1 2014/10/11 A        80    40    20     1
#2 2014/10/13 B         9    10     9     3
#3 2014/10/14 D        16     5    12     4
#4 2014/10/15 D         1     6     5     5
#5 2014/10/16 B        40    14    16     6
#6 2014/10/18 A        11     9     5     8
#7 2014/10/19 C        31    20    23     9

data

df <- structure(list(Time = structure(1:9, .Label = c("2014/10/11", 
"2014/10/12", "2014/10/13", "2014/10/14", "2014/10/15", "2014/10/16", 
"2014/10/17", "2014/10/18", "2014/10/19"), class = "factor"), 
a = structure(c(1L, 1L, 2L, 4L, 4L, 2L, 2L, 1L, 3L), .Label = c("A", 
"B", "C", "D"), class = "factor"), b = c(40L, 40L, 9L, 16L, 
1L, 20L, 20L, 11L, 31L), c = c(20L, 20L, 10L, 5L, 6L, 7L, 
7L, 9L, 20L), d = c(10L, 10L, 9L, 12L, 5L, 8L, 8L, 5L, 23L
), id = 1:9), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9"))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Sorry the code worked on my data but also deleted other rows with the same id, is there a way to just delete the second row after summing up the columns for b,c,d and nothing else changes? Thanks – nilsinelabore Sep 03 '19 at 05:41
  • @Harper can you change the `slice` line from `slice(1L)` to `slice(if (n() > 1) -2L else 1L)` and see if this is what you want ? – Ronak Shah Sep 03 '19 at 05:54