-1

Hello great and omnificent stackoverflow,

I have a data frame structured like so.

Person     Dilution      Analyte     Meta#1      Meta#2
john         1            Blank       3x          100
john         2            Blank       3x          100
john         1            mulv        3x          100
john         2            mulv        3x          100
john         1            gp41        3x          100
john         2            gp41        3x          100
kelly        20           blank       3x          100
kelly        20           gp41        3x          100

There could be many persons, with many different dilutions and analytes. The meta info will always be the same down the column. I would like to produce the following data frame:

Person     Dilution      Analyte            Meta#1      Meta#2
john        1,2          Blank,mulv,gp41     3x          100
kelly       20           blank,gp41          3x          100

I was wondering if anyone knows of any nifty tricks for concatenating information such as this?

r3vdev
  • 315
  • 3
  • 10
  • 1
    [Google](https://www.google.co.il/?ion=1&espv=2&client=ubuntu#q=concatenate+by+group+r) – David Arenburg Aug 29 '16 at 15:45
  • @DavidArenburg and @akrun, it's a duplicate, but data.table, dplyr and aggregate are not used in the answers to the other question. One of the answers to the other question uses `plyr`, which has been superseded by other Hadleyverse packages. What do you think is the best course of action when you have a duplicate question, but answers to the original question are outdated or don't include "state of the art" methods: Should I (1) still answer the duplicate question with the new or updated methods, (2) add the new or updated methods as an answer to the original question, (3) both, (4) neither? – eipi10 Aug 30 '16 at 16:46
  • 1
    @eipi10 I've closed with a better dupe. But the Whole Google page is full with questions/answers with those packages too – David Arenburg Aug 30 '16 at 16:55
  • @DavidArenburg Sounds good, but what do you think about the general issue when the original question doesn't have up-to-date or state-of-the-art answers? (Feel free to move this to chat if you want to discuss.) – eipi10 Aug 30 '16 at 16:57
  • @eipi10 you should close as a dupe in add answer in the target dupe. This way the target will have all the information instead of it being scattered all over the site. This is what I think why SO encourages to close questions as dupes in the first place AFAIK – David Arenburg Aug 30 '16 at 17:18

2 Answers2

3

I changed two of your column names, as # is not a legal character in R data frame column names.

With the dplyr package

library(dplyr)

df %>%
  group_by(Person) %>%
  summarise_all(funs(paste(unique(.), collapse=",")))
  Person Dilution         Analyte Meta1 Meta2
1   john      1,2 Blank,mulv,gp41    3x   100
2  kelly       20      blank,gp41    3x   100

With the data.table package

library(data.table)

setDT(df)[, lapply(.SD, function(x) paste(unique(x), collapse=",")), by=Person]
eipi10
  • 91,525
  • 24
  • 209
  • 285
1

Base R solution would be using aggregate.

aggregate(.~Person, df, function(x) as.character(unique(x)))


#   Person Dilution         Analyte     Meta1 Meta2
#1   john     1, 2   Blank, mulv, gp41     3x   100
#2  kelly       20       blank, gp41       3x   100

Similarly,

aggregate(.~Person, df, function(x) toString(unique(x)))

Assuming the class of columns with strings are characters and not factors.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213