0

I have a tibble with a thousand something rows, and two columns, the first of which contains a categorical label, and the second a character string. Here is a MRE

tab <- tibble(category = c("CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT2", "CAT2", "CAT2", "CAT2"),
              word = c("Lorem", "ipsum", "dolor", "sit", "amet", "Consectetur", "adipiscing", "elit", "nam"))
tab

# A tibble: 9 x 2
  category word       
  <chr>    <chr>      
1 CAT1     Lorem      
2 CAT1     ipsum      
3 CAT1     dolor      
4 CAT1     sit        
5 CAT1     amet       
6 CAT2     Consectetur
7 CAT2     adipiscing 
8 CAT2     elit       
9 CAT2     nam

Now, what I'd like to do is to collapse these rows in order to have only one row per category and all the words for that category together in a single cell, separated by a semi-colon. Like that:

# A tibble: 2 x 2
  category word                              
  <chr>    <chr>                             
1 CAT1     Lorem; ipsum; dolor; sit; amet    
2 CAT2     Consectetur; adipiscing; elit; nam

Does anyone know how I could tackle this issue and is willing to fly my rescue?

SebasComm
  • 35
  • 4
  • 1
    Does this answer your question? [Collapse / concatenate / aggregate a column to a single comma separated string within each group](https://stackoverflow.com/questions/15933958/collapse-concatenate-aggregate-a-column-to-a-single-comma-separated-string-w) – nniloc Sep 17 '21 at 17:38

2 Answers2

2

We can use

library(dplyr)
library(stringr)
tab %>%
    group_by(category) %>%
    summarise(word = str_c(word, collapse ="; "))

-output

# A tibble: 2 x 2
  category word                              
  <chr>    <chr>                             
1 CAT1     Lorem; ipsum; dolor; sit; amet    
2 CAT2     Consectetur; adipiscing; elit; nam
akrun
  • 874,273
  • 37
  • 540
  • 662
0

We can also pivot_wider, then unite the desired columns:

library(dplyr)
library(tidyr)

tab %>% pivot_wider(names_from = word, values_from = word) %>%
        unite(col='word', -category, sep=', ', na.rm = TRUE)

# A tibble: 2 x 2
  category word                              
  <chr>    <chr>                             
1 CAT1     Lorem, ipsum, dolor, sit, amet    
2 CAT2     Consectetur, adipiscing, elit, nam
GuedesBF
  • 8,409
  • 5
  • 19
  • 37