0

I would like to transform a long data set with repeated observations:

obs code
1    A
2    B
4    G
2    D
1    H
3    K

Into a "shorter" data set with a comma separated, summary column:

obs  code
1    A,H
2    B,D
3    K
4    G

I tried something like:

df <- data.frame(obs=c("1","2","4","2","1","3"), code=c("A","B","G","D","H","K"),stringsAsFactors = F)


df %>% group_by(obs) %>%
  mutate(id=1:n()) %>% 
  spread(id,code) %>%
  replace_na(list(`1` = "", `2` = "")) %>%
  unite(new,2:3, remove=FALSE,sep=",")

However, this gives me additional "," for obs 3 and 4. Is there a nicer was to solve my problem?

captcoma
  • 1,768
  • 13
  • 29

2 Answers2

3

Instead of spreading to 'wide' format and then using replace_na, this can be done more directly by pasteing the 'code' in summarise after the group_by step

df %>%
  group_by(obs) %>%
  summarise(code = toString(code))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Here is a base R solution using aggregate as an alternative:

with(df, aggregate(code, by = list(obs = obs), toString));
#  obs    x
#1   1 A, H
#2   2 B, D
#3   3    K
#4   4    G

If you don't want the extra space between codes, you can paste0 entries (instead of using toString):

with(df, aggregate(code, by = list(obs = obs), paste0, collapse = ","));
#  obs   x
#1   1 A,H
#2   2 B,D
#3   3   K
#4   4   G
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68