0

I have a dataframe that contains some duplicates, around 100 of them, the data is displayed like this:

Data                   V1       V2      V3      V4 
Cellulomonas uda      0.2       0.0     0.0     0.1
Cellulomonas uda      0.0       0.1     0.3     0.1

But I would like to find all the duplicates in the dataframe and add them together, to give this:

Data                   V1       V2      V3      V4 
Cellulomonas uda      0.2       0.1     0.3     0.2

Is there a function in dplyr which could help with this? Or even a way to add the rows together in Excel and just manually deleting one of the duplicates would be fine.

2 Answers2

3

You can take the sum of V values for each Data value :

df1 <- read.table(text="Data                   V1       V2      V3      V4 
'Cellulomonas uda'      0.2       0.0     0.0     0.1
'Cellulomonas uda'      0.0       0.1     0.3     0.1",h=T,string=F)

library(dplyr)

df1 %>% group_by(Data) %>% summarize_all(sum)
# # A tibble: 1 x 5
#                 Data    V1    V2    V3    V4
#                <chr> <dbl> <dbl> <dbl> <dbl>
#   1 Cellulomonas uda   0.2   0.1   0.3   0.2
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
2

With base R we could use aggregate:

aggregate(. ~ Data, df1, sum)

              Data  V1  V2  V3  V4
1 Cellulomonas uda 0.2 0.1 0.3 0.2

And with data.table I think we could do:

library(data.table)

dt[, lapply(.SD, sum), by = Data]

              Data  V1  V2  V3  V4
1 Cellulomonas uda 0.2 0.1 0.3 0.2
tyluRp
  • 4,678
  • 2
  • 17
  • 36