How can I merge two (or more) rows of an R data frame when one column has duplicate strings?

Question

I have another R problem to solve. I have data similar to the following.

SN <- c(1001, 1001, 1001, 1002, 1002, 1002, 1003, 1003, 1003)
category <- c("fender", "chassis", "mirror", "fender", "mirror", "fender", "mirror", "fender", "chassis")
note <- c("rust", "rust", "chip", "crack", "broken", "rust", "missing", "dent", "normal")
df <- data.frame(SN, category, note)
df

which produces the following data frame.

 SN category    note
1 1001   fender    rust
2 1001  chassis    rust
3 1001   mirror    chip
4 1002   fender   crack
5 1002   mirror  broken
6 1002   fender    rust
7 1003   mirror missing
8 1003   fender    dent
9 1003  chassis  normal

Here is what I WANT to have:

I would like be able combine rows that have both a matching SN and category, concatenating the notes. So for SN 1002, where there are two "fender" categories, those would be combined into one row, resulting in a data frame that looks like this:

>df1 
SN1 category1         note1
1 1001    fender          rust
2 1001   chassis          rust
3 1001    mirror          chip
4 1002    fender crack -- rust
5 1002    mirror        broken
6 1003    mirror       missing
7 1003    fender          dent
8 1003   chassis        normal

My reason for this is that I need to be able to use pivot_wider in order to produce a table that is kind of like this:

# A tibble: 3 x 4
    SN1 fender        chassis mirror 
  <dbl> <chr>         <chr>   <chr>  
1  1001 rust          rust    chip   
2  1002 crack - rust NA      broken 
3  1003 dent          normal  missing

Please keep in mind that my real data is hundreds of SNs each with a few dozen categories each, resulting in several thousand rows of data, so I need a solution that doesn't involve me having to go through and search around for duplicate categories within a SN, but instead can handle things more automatically. Hope somebody can help. Thanks a bunch!

`tidyr::pivot_wider(df, names_from = category, values_from = note, values_fn = function(x) glue::glue_collapse(x, sep = " - "))` Or `pivot_wider(df, names_from = category, values_from = note, values_fn = function(x) paste(x, collapse = " - "))` — Phil, Sep 22 '21 at 00:56

score 0 · Answer 1 · answered Sep 22 '21 at 00:58

library(dplyr)

df %>% 
  group_by(SN,category) %>% 
  summarise(
    note = paste0(note,collapse = " -- ")
  )

# A tibble: 8 x 3
# Groups:   SN [3]
     SN category note         
  <dbl> <chr>    <chr>        
1  1001 chassis  rust         
2  1001 fender   rust         
3  1001 mirror   chip         
4  1002 fender   crack -- rust
5  1002 mirror   broken       
6  1003 chassis  normal       
7  1003 fender   dent         
8  1003 mirror   missing

How can I merge two (or more) rows of an R data frame when one column has duplicate strings?

1 Answers1