2

I have the following dataframe:

df1 <- data.frame( id = c(1,2,2,3),
                   word = c("house, garden, flower", "flower, red", "garden, tree, forest", "house, window, door, red"),
                   value = c(10,12,20,5),
                   stringsAsFactors = FALSE
)

Now I want to consolidate rows based on id. So if there is a duplicated id, the values in column word should be merged and the column value should be summed up. This means the df should look like this:

id | word                              | value
1  | house, garden, flower             | 10
2  | flower, red, garden, tree, forest | 32
3  | house, window, door, red          | 5

Does anybody has an idea, how to solve this problem?

nississippi
  • 327
  • 2
  • 17

3 Answers3

2

In base R:

df1 <- data.frame( id = c(1,2,2,3),
                   word = c("house, garden, flower", "flower, red", "garden, tree, forest", "house, window, door, red"),
                   value = c(10,12,20,5),
                   stringsAsFactors = FALSE
)

want <- data.frame(id = unique(df1$id),
                   word = tapply(df1$word, df1$id, paste, collapse = ", "),
                   value = tapply(df1$value, df1$id, sum))
want

  id                              word value
1  1             house, garden, flower    10
2  2 flower, red, garden, tree, forest    32
3  3          house, window, door, red     5
divibisan
  • 11,659
  • 11
  • 40
  • 58
r.user.05apr
  • 5,356
  • 3
  • 22
  • 39
0

This is pretty simple using tidyverse. Just group by id and then use the summarize function to generate the group variables you want:

library(tidyverse)
df1 %>%
    group_by(id) %>%
    dplyr::summarize(word = paste0(word, collapse=", "),
              value = sum(value))

     id word                              value
  <dbl> <chr>                             <dbl>
1     1 house, garden, flower                10
2     2 flower, red, garden, tree, forest    32
3     3 house, window, door, red              5
divibisan
  • 11,659
  • 11
  • 40
  • 58
  • I dont know what I am doing wrong, but with this code all words are put together in one row instead of ordering it to the id. So I get a df with only one row: word and value and everything is summed up. And I just copy paste your code... – nississippi May 30 '18 at 07:21
  • You have the package `plyr` loaded and so you're using that version of `summarize`. If you specify `dplyr::summarize` or unload that package then it will work as expected. – divibisan May 30 '18 at 12:59
0

Simply like this with dplyr package:

library(dplyr)

df1 %>%
  group_by(id) %>%
  summarise(
    word = paste(word, collapse = ', '),
    value=sum(value)
  )

Output:

# A tibble: 3 x 3
     id word                              value
  <dbl> <chr>                             <dbl>
1    1. house, garden, flower               10.
2    2. flower, red, garden, tree, forest   32.
3    3. house, window, door, red             5.
cirofdo
  • 1,074
  • 6
  • 22
  • I dont know what I am doing wrong, but with this code all words are put together in one row instead of ordering it to the id. So I get a df with only one row: word and value and everything is summed up. And I just copy paste your code... – nississippi May 30 '18 at 07:21