R - sum value and merge column based on condition

Question

I have the following dataframe:

df1 <- data.frame( id = c(1,2,2,3),
                   word = c("house, garden, flower", "flower, red", "garden, tree, forest", "house, window, door, red"),
                   value = c(10,12,20,5),
                   stringsAsFactors = FALSE
)

Now I want to consolidate rows based on id. So if there is a duplicated id, the values in column word should be merged and the column value should be summed up. This means the df should look like this:

id | word                              | value
1  | house, garden, flower             | 10
2  | flower, red, garden, tree, forest | 32
3  | house, window, door, red          | 5

Does anybody has an idea, how to solve this problem?

The value of id == 2 should be 32, right? – cirofdo May 29 '18 at 12:46 — cirofdo, May 29 '18 at 12:46
You are right! Sorry, I corrected it. – nississippi May 29 '18 at 12:49 — nississippi, May 29 '18 at 12:49

score 2 · Accepted Answer · edited May 29 '18 at 12:56

In base R:

df1 <- data.frame( id = c(1,2,2,3),
                   word = c("house, garden, flower", "flower, red", "garden, tree, forest", "house, window, door, red"),
                   value = c(10,12,20,5),
                   stringsAsFactors = FALSE
)

want <- data.frame(id = unique(df1$id),
                   word = tapply(df1$word, df1$id, paste, collapse = ", "),
                   value = tapply(df1$value, df1$id, sum))
want

  id                              word value
1  1             house, garden, flower    10
2  2 flower, red, garden, tree, forest    32
3  3          house, window, door, red     5

divibisan · Answer 2 · 2018-05-30T12:59:09.520

0

This is pretty simple using tidyverse. Just group by id and then use the summarize function to generate the group variables you want:

library(tidyverse)
df1 %>%
    group_by(id) %>%
    dplyr::summarize(word = paste0(word, collapse=", "),
              value = sum(value))

     id word                              value
  <dbl> <chr>                             <dbl>
1     1 house, garden, flower                10
2     2 flower, red, garden, tree, forest    32
3     3 house, window, door, red              5

edited May 30 '18 at 12:59

answered May 29 '18 at 12:49

divibisan

11,659
11
40
58

I dont know what I am doing wrong, but with this code all words are put together in one row instead of ordering it to the id. So I get a df with only one row: word and value and everything is summed up. And I just copy paste your code... – nississippi May 30 '18 at 07:21
You have the package `plyr` loaded and so you're using that version of `summarize`. If you specify `dplyr::summarize` or unload that package then it will work as expected. – divibisan May 30 '18 at 12:59

score 0 · Answer 3 · answered May 29 '18 at 12:51

0

Simply like this with dplyr package:

library(dplyr)

df1 %>%
  group_by(id) %>%
  summarise(
    word = paste(word, collapse = ', '),
    value=sum(value)
  )

Output:

# A tibble: 3 x 3
     id word                              value
  <dbl> <chr>                             <dbl>
1    1. house, garden, flower               10.
2    2. flower, red, garden, tree, forest   32.
3    3. house, window, door, red             5.

answered May 29 '18 at 12:51

cirofdo

1,074
6
22

I dont know what I am doing wrong, but with this code all words are put together in one row instead of ordering it to the id. So I get a df with only one row: word and value and everything is summed up. And I just copy paste your code... – nississippi May 30 '18 at 07:21

R - sum value and merge column based on condition

3 Answers3

Linked