0

I'm new here, so I've tried to make this explanation and reprex as simple as possible following the guidelines, but apologies for any mistakes or etiquette boobys I've made, though I've tried to avoid them - I'm learning!

So in R I have two columns, ID and Area. Neither of these are unique values. Sometimes ID will match up to more than one Area, as shown by "ABC" below.

data.frame(ID=c("ABC", "def", "ghi", "ABC", "jkl", "jkl"),
Area=c("area1", "area2", "area3", "area4", "area5", "area5"))

How do I create a third column that concatenates all possible entries of Area for each ID, so that it looks like:

data.frame(ID=c("ABC", "def", "ghi", "ABC", "jkl", "jkl"),
Area=c( "area1", "area2", "area3", "area4", "area5", "area5"),
AreaComb=c("area1 & area4", "area2", "area3", "area1 & area4", "area5", "area 5"))

I would greatly appreciate any help I could get with this. I am learning R using DataCamp and have not got this far yet!

Edit: I should have said that I need the concatenation to just include all possible UNIQUE values, ie, entries under ID=="jkl" should only have "area5" in AreaComb, not "area5 & area5", etc.

Zanskobia
  • 3
  • 2
  • Welcome @Zanskobia, FYI you probably shouldn't have Rstudio in you question title as this is a general R problem and not something specific to the Rstudio IDE. – Richard J. Acton Sep 04 '19 at 13:47
  • Hi all, I've updated my question and reprexes, as your solutions caused some entries in AreaComb to loop over and over, eg "area 5 & area5", etc. Based on this correction (apologies) do you know how I could approach this problem? – Zanskobia Sep 04 '19 at 14:38
  • @Zanskobia If both the areas are exactly same. ("area5" is not same as "area 5" though) we can just add `unique` to it before pasting. I have updated the answer. – Ronak Shah Sep 04 '19 at 14:47
  • Take a look through solutions [here](https://stackoverflow.com/q/15933958/5325862); they don't include the step of deduplicating, but should get you started – camille Sep 04 '19 at 15:05
  • Possible duplicate of [Concatenate unique strings after groupby in R](https://stackoverflow.com/questions/51090048/concatenate-unique-strings-after-groupby-in-r) – camille Sep 04 '19 at 15:07

3 Answers3

1

Using dplyr, we can group_by ID and paste Area together for each group.

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(AreaComb = paste(unique(Area), collapse = " & "))

#   ID    Area  AreaComb     
#  <fct> <fct> <chr>        
#1 ABC   area1 area1 & area4
#2 def   area2 area2        
#3 ghi   area3 area3        
#4 ABC   area4 area1 & area4
#5 jkl   area5 area5       

data

df <- data.frame(ID=c("ABC", "def", "ghi", "ABC", "jkl", "jkl"),
       Area=c("area1", "area2", "area3", "area4", "area5", "area5"))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

We can use tidyverse

library(dplyr)
library(stringr)
df %>%
   group_by(ID) %>%
   mutate(AreaComb = str_c(Area, collapse = " & "))
# A tibble: 5 x 3
# Groups:   ID [4]
#  ID    Area  AreaComb     
#  <fct> <fct> <chr>        
#1 ABC   area1 area1 & area4
#2 def   area2 area2        
#3 ghi   area3 area3        
#4 ABC   area4 area1 & area4
#5 jkl   area5 area5        

Or with data.table

library(data.table)
setDT(df)[, AreaComb := paste(Area, collapse= " & "), by = ID]

data

df <- data.frame(ID=c("ABC", "def", "ghi", "ABC", "jkl"),
             Area=c("area1", "area2", "area3", "area4", "area5")) 
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Make sure your strings are character (i.e. NOT factors) and use ave, i.e.

dd <- data.frame(ID = c("ABC", "def", "ghi", "ABC", "jkl"),
                Area = c("area1", "area2", "area3", "area4", "area5"), 
                stringsAsFactors = FALSE) #<--- Notice this argument

with(dd, ave(Area, ID, FUN = function(i)paste(i, collapse = ' & ')))
#[1] "area1 & area4" "area2"         "area3"         "area1 & area4" "area5"    
Sotos
  • 51,121
  • 6
  • 32
  • 66