3

I have a df that looks like the follow:

ID   LOC
1     A
1     A
2     A
2     B
3     A
3     A
3     A
4     A
4     B
4     C

What I would like to do is collapse the LOC values to one row per ID and if they are the same keep it as one value, so my output would be:

ID   LOC
1     A
2     A + B
3     A
4     A + B + C

Right now I am using:

group_by(ID) %>%
mutate(concat_LOC = paste0(LOC, collapse = " + ")) 

which concats all values even if they are the same

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
user3249770
  • 329
  • 2
  • 7

2 Answers2

5
  1. You're reducing/summarizing, so you should use summarize instead of mutate.
  2. Since you do want to repeat LOC values, use unique.
dat %>%
  group_by(ID) %>%
  summarize(LOC = paste(unique(LOC), collapse = " + ")) %>%
  ungroup()
# # A tibble: 4 x 2
#      ID LOC      
#   <int> <chr>    
# 1     1 A        
# 2     2 A + B    
# 3     3 A        
# 4     4 A + B + C
r2evans
  • 141,215
  • 6
  • 77
  • 149
0

There are many ways to skin a cat in R, here is a Base, tidyverse, and data.table way:

# Base R solution: res => data.frame
res <- aggregate(
  LOC ~ ID,
  df, 
  FUN = function(x){
    paste0(unique(x), collapse = ' + ')
  }
)

# Output data.frame to console: data.frame => stdout(console) 
res

# Tidyverse method: tv_res => tibble
library(tidyverse)
tv_res <- df %>% 
  group_by(ID) %>% 
  distinct() %>% 
  summarise(LOC = str_c(LOC, collapse = ' + ')) %>% 
  ungroup()


# Print the data.frame to the console: 
# tibble => stdout(console)
tv_res

# Data.table method: 
# Coerce data.frame to data.table: dt => data.table object
dt <- data.table(df)

# Aggregate the data.table: dt_res => data.table object
dt_res <- dt[, 
          list(LOC = paste(unique(LOC), collapse = ' + ')),
          by = ID]

# Print the data to the console: data.table => console(stdout)
dt_res

Data used:

# Import the data: df => data.frame 
df <- structure(list(ID = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), 
LOC = c("A", "A", "A", "B", "A", "A", "A", "A", "B", "C")), 
class = "data.frame", row.names = c(NA, -10L))
hello_friend
  • 5,682
  • 1
  • 11
  • 15