Collapse column value into one if different value

Question

I have a df that looks like the follow:

What I would like to do is collapse the LOC values to one row per ID and if they are the same keep it as one value, so my output would be:

ID   LOC
1     A
2     A + B
3     A
4     A + B + C

Right now I am using:

group_by(ID) %>%
mutate(concat_LOC = paste0(LOC, collapse = " + "))

which concats all values even if they are the same

score 5 · Accepted Answer · answered Oct 19 '21 at 21:20

5

You're reducing/summarizing, so you should use summarize instead of mutate.
Since you do want to repeat LOC values, use unique.

dat %>%
  group_by(ID) %>%
  summarize(LOC = paste(unique(LOC), collapse = " + ")) %>%
  ungroup()
# # A tibble: 4 x 2
#      ID LOC      
#   <int> <chr>    
# 1     1 A        
# 2     2 A + B    
# 3     3 A        
# 4     4 A + B + C

answered Oct 19 '21 at 21:20

r2evans

141,215
6
77
149

FYI, very related: https://stackoverflow.com/q/11562656/3358272 – r2evans Oct 19 '21 at 21:34

score 0 · Answer 2 · answered Oct 19 '21 at 22:56

There are many ways to skin a cat in R, here is a Base, tidyverse, and data.table way:

# Base R solution: res => data.frame
res <- aggregate(
  LOC ~ ID,
  df, 
  FUN = function(x){
    paste0(unique(x), collapse = ' + ')
  }
)

# Output data.frame to console: data.frame => stdout(console) 
res

# Tidyverse method: tv_res => tibble
library(tidyverse)
tv_res <- df %>% 
  group_by(ID) %>% 
  distinct() %>% 
  summarise(LOC = str_c(LOC, collapse = ' + ')) %>% 
  ungroup()


# Print the data.frame to the console: 
# tibble => stdout(console)
tv_res

# Data.table method: 
# Coerce data.frame to data.table: dt => data.table object
dt <- data.table(df)

# Aggregate the data.table: dt_res => data.table object
dt_res <- dt[, 
          list(LOC = paste(unique(LOC), collapse = ' + ')),
          by = ID]

# Print the data to the console: data.table => console(stdout)
dt_res

Data used:

# Import the data: df => data.frame 
df <- structure(list(ID = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), 
LOC = c("A", "A", "A", "B", "A", "A", "A", "A", "B", "C")), 
class = "data.frame", row.names = c(NA, -10L))

Collapse column value into one if different value

2 Answers2