Adding together rows from duplicate entries in a dataframe (Excel or R)

Question

I have a dataframe that contains some duplicates, around 100 of them, the data is displayed like this:

Data                   V1       V2      V3      V4 
Cellulomonas uda      0.2       0.0     0.0     0.1
Cellulomonas uda      0.0       0.1     0.3     0.1

But I would like to find all the duplicates in the dataframe and add them together, to give this:

Data                   V1       V2      V3      V4 
Cellulomonas uda      0.2       0.1     0.3     0.2

Is there a function in dplyr which could help with this? Or even a way to add the rows together in Excel and just manually deleting one of the duplicates would be fine.

consider `na.rm=T` for `sum`-function – Andre Elrico Mar 19 '18 at 11:46 — Andre Elrico, Mar 19 '18 at 11:46

moodymudskipper · Accepted Answer · 2018-03-19T23:56:33.163

3

You can take the sum of V values for each Data value :

df1 <- read.table(text="Data                   V1       V2      V3      V4 
'Cellulomonas uda'      0.2       0.0     0.0     0.1
'Cellulomonas uda'      0.0       0.1     0.3     0.1",h=T,string=F)

library(dplyr)

df1 %>% group_by(Data) %>% summarize_all(sum)
# # A tibble: 1 x 5
#                 Data    V1    V2    V3    V4
#                <chr> <dbl> <dbl> <dbl> <dbl>
#   1 Cellulomonas uda   0.2   0.1   0.3   0.2

edited Mar 19 '18 at 23:56

answered Mar 19 '18 at 11:22

moodymudskipper

46,417
11
121
167

1

`summarise_all(sum)` might be more appropriate for the OP's question. – hpesoj626 Mar 19 '18 at 11:25
This hasn't added them together though. Oh sorry Mudskipper, I just saw your comment. The values are not duplicated, just some of the names in the "Data" column. The values in V1 etc are unique. – CodingIsHardMan Mar 19 '18 at 11:28
Yes, using `sum` seems to work fine. Thank you both – CodingIsHardMan Mar 19 '18 at 11:34
alright edited, thanks @hpesoj626 – moodymudskipper Mar 19 '18 at 11:35
1

I think `V4` should have a value of `0.2` – tyluRp Mar 19 '18 at 11:49
1

edited again ;) – moodymudskipper Mar 19 '18 at 11:56

tyluRp · Answer 2 · 2018-03-19T11:45:08.523

2

With base R we could use aggregate:

aggregate(. ~ Data, df1, sum)

              Data  V1  V2  V3  V4
1 Cellulomonas uda 0.2 0.1 0.3 0.2

And with data.table I think we could do:

library(data.table)

dt[, lapply(.SD, sum), by = Data]

              Data  V1  V2  V3  V4
1 Cellulomonas uda 0.2 0.1 0.3 0.2

edited Mar 19 '18 at 11:45

answered Mar 19 '18 at 11:38

tyluRp

4,678
2
17
36

Adding together rows from duplicate entries in a dataframe (Excel or R)

2 Answers2

Linked