-1

I have data like:

    V1    V2    V3    V4    V5
1   A     5.7    f    n     j47
2   A     5.7    f    n     j44
3   A     5.7    f    n     e54
4   B     5.1    m    y     j47
5   B     5.1    m    y     r10

I want my output to like like:

1:    A    5.7    f    n    j47, j44, e54
2;    B    5.1    m    y    j47, r10

ie only v5 vaires but I want to collapse my data so that all values for V5 for each identifier (given in V1) can be seen.

How can I do this?

Sam Lipworth
  • 107
  • 4
  • 12
  • 1
    `df %>% group_by_at(vars(V1:V4)) %>% summarise_all(toString)` – alistaire Dec 07 '17 at 00:31
  • 3
    Possible duplicate of [Collapse / concatenate / aggregate a column to a single comma separated string within each group](https://stackoverflow.com/questions/15933958/collapse-concatenate-aggregate-a-column-to-a-single-comma-separated-string-w) – Ronak Shah Dec 07 '17 at 01:14
  • Please use `dput()` to show your data! – jogo Feb 16 '18 at 14:10

2 Answers2

2

Using tidyr which is also part of the tidyverse along with dplyr: (Edit Note this adds a list column rather than collapsing V5 into a single string for each group)

library(tidyr)

nest(df, V5)    
#>   V1  V2 V3 V4          data
#> 1  A 5.7  f  n j47, j44, e54
#> 2  B 5.1  m  y      j47, r10

Data used

df <- (read.table(text = "V1    V2    V3    V4    V5
1   A     5.7    f    n     j47
2   A     5.7    f    n     j44
3   A     5.7    f    n     e54
4   B     5.1    m    y     j47
5   B     5.1    m    y     r10", header = TRUE, stringsAsFactors = FALSE))
markdly
  • 4,394
  • 2
  • 19
  • 27
  • This is pretty neat but it removes the letters (which are icd10 codes, hence critical) – Sam Lipworth Dec 07 '17 at 09:39
  • @SamLipworth, I'm not quite sure what you mean by the letters being removed. Expanding my answer above to `library(tidyverse); nest(df, V5, .key = "V5") %>% mutate(V5 = map_chr(V5, ~ paste0(.$V5, collapse = ", ")))` should give you an answer equivalent to the other answers provided (although not as elegantly!). I hope this helps – markdly Dec 07 '17 at 10:38
2

Another solution using base R's aggregate (no additional libraries necessary):

aggregate(data = df, V5 ~ V1 + V2 + V3 + V4, paste, collapse = ", ")
#  V1  V2 V3 V4            V5
#1  A 5.7  f  n j47, j44, e54
#2  B 5.1  m  y      j47, r10

Data

df <- read.table(text = 
    "V1    V2    V3    V4    V5
     1   A     5.7    f    n     j47
     2   A     5.7    f    n     j44
     3   A     5.7    f    n     e54
     4   B     5.1    m    y     j47
     5   B     5.1    m    y     r10", header = T)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68