Concatenate rows in a column by ID in R

Question

I have a geochemical data set that is has 50 columns and 16000 rows. I need to concatenate data in rows in the Field_Notes column grouped by the Sample_ID. I have tried using the aggregate function which works but I end up with a subset of only those two columns and duplicates deleted. I would be very grateful if anyone has any suggestions.

My data looks like this:

   Sampe_ID Year Alt_Min    Field_Notes  X
1 97PQQT007 1997 Fe-Carb      qtz sweel  2
2 97PQQT007 1997 Fe-Carb          v. tr  2
3 97PQQT014 1997    <NA> qtz vn in bslt 15
4 97PQQT014 1997    <NA>    1-2% py,cpy 15
5 97PQQT006 1997 Fe-Carb         qtz vn  2
6 97PQQT004 1997 Fe-Carb         qtz vn  1
7 97PQQT004 1997 Fe-Carb           none  1

But should look like this:

   Sampe_ID Year Alt_Min                 Field_Notes  X
1 97PQQT007 1997 Fe-Carb            qtz sweel, v. tr  2
2 97PQQT007 1997 Fe-Carb            qtz sweel, v. tr  2
3 97PQQT014 1997    <NA> qtz vn in bslt, 1-2% py,cpy 15
4 97PQQT014 1997    <NA> qtz vn in bslt, 1-2% py,cpy 15
5 97PQQT006 1997 Fe-Carb                      qtz vn  2
6 97PQQT004 1997 Fe-Carb                qtz vn, none  1
7 97PQQT004 1997 Fe-Carb                qtz vn, none  1

Here is a reproducible dataframe:

geochem <- data.frame(Sample_ID= c(1,1,2,2,3,4,4), Year = rep(1997, 7), Alt_Min = c(rep("Fe-Carb",2), rep(NA,2), rep("Fe-Carb",3)), Field_Notes = c("qtz sweel", "v. tr", "qtz vn in bslt", "1-2% py,cpy", "qtz  vn", "qtz vn", "none"), x = c(2,2,15,15,2,1,1))

Possible duplicate http://stackoverflow.com/questions/15933958/collapse-concatenate-aggregate-a-column-to-a-single-comma-separated-string-w — Jaap, Oct 19 '16 at 18:12
Try using `ave`. Something like `df$newVar <- ave(df$Field_Notes, df$Sampe_ID, FUN=c)` or `paste` instead of `c` maybe. — lmo, Oct 19 '16 at 18:12
With the `data.table` package: `setDT(df)[, Field_Notes := toString(Field_Notes), by = Sample_ID]` or with `dplyr`: `df %>% group_by(Sample_ID) %>% mutate(Field_Notes = toString(Field_Notes))` — Jaap, Oct 19 '16 at 18:17
Thank you for your help. I have posted a reproducible dataframe and apologize if I am posting incorrectly. I am a newbie here! I have tried your suggestions but with no luck. I have really been struggling to find an answer to this. — Marycee, Oct 19 '16 at 19:21

score 3 · Accepted Answer · answered Oct 19 '16 at 19:57

You can use a combination of within and ave to transform the data frame.

within(geochem,
       {Field_Notes <- as.character(Field_Notes);
        Field_Notes <- ave(Field_Notes, Sample_ID, FUN = toString)})

The function toString is used to paste strings together.

Note that the factor Field_Notes is transformed to a string variable.

The result:

  Sample_ID Year Alt_Min                 Field_Notes  x
1         1 1997 Fe-Carb            qtz sweel, v. tr  2
2         1 1997 Fe-Carb            qtz sweel, v. tr  2
3         2 1997    <NA> qtz vn in bslt, 1-2% py,cpy 15
4         2 1997    <NA> qtz vn in bslt, 1-2% py,cpy 15
5         3 1997 Fe-Carb                     qtz  vn  2
6         4 1997 Fe-Carb                qtz vn, none  1
7         4 1997 Fe-Carb                qtz vn, none  1

That's brilliant! Thank you so much. I have been struggling with this for hours. — Marycee, Oct 19 '16 at 20:05

boshek · Answer 2 · 2016-10-19T19:47:42.757

2

Without a reproducible dataframe, I can't be sure if this will work but it should. Use dplyr here and replace df with your dataframe name.

library(dplyr)

geochem <- data.frame(Sample_ID= c(1,1,2,2,3,4,4), 
                      Year = rep(1997, 7), Alt_Min = c(rep("Fe-Carb",2), rep(NA,2), rep("Fe-Carb",3)), 
                      Field_Notes = c("qtz sweel", "v. tr", "qtz vn in bslt", 
                                      "1-2% py,cpy", "qtz  vn", "qtz vn", "none"), 
                      x = c(2,2,15,15,2,1,1)) 

geochem %>% group_by(x,Sample_ID) %>% summarise(Field_Notes=paste(Field_Notes, collapse=","))

Here is the exact output I get:

   x      Sample_ID                Field_Notes
  <dbl>     <dbl>                      <chr>
     1         4                qtz vn,none
     2         1            qtz sweel,v. tr
     2         3                    qtz  vn
    15         2 qtz vn in bslt,1-2% py,cpy

edited Oct 19 '16 at 19:47

answered Oct 19 '16 at 18:11

boshek

4,100
1
31
55

Thank you very much for your suggestion. Here is a reproducible dataframe which may help: – Marycee Oct 19 '16 at 19:05
Unfortunately this was the result when i ran your code: – Marycee Oct 19 '16 at 19:13
Field_Notes 1 qtz sweel,v. tr,qtz vn in bslt,1-2% py,cpy,qtz vn,qtz vn,none > – Marycee Oct 19 '16 at 19:14
Edited the answer above using your example then added `Sample_ID` as a grouping variable. Works well now. – boshek Oct 19 '16 at 19:23
Thanks, I am not sure if I am running your code correctly but the result is a dataframe with 1 observation of 1 variable. I need to retain the entire dataset and just concatenate the rows in the Field_Notes column grouped by the Sample_ID. – Marycee Oct 19 '16 at 19:38
Do you not also want them grouped by `x`? Is the output above not the one wanted? – boshek Oct 19 '16 at 19:48

Concatenate rows in a column by ID in R

2 Answers2

Related