-1

I have an excel input file containing information about blood samples. Each subject gave three blood samples; each sample has a numeric ID. These sample IDs are currently encoded in a column as follows:

Surname Name    Idsample
Jash    Milena  05311511
Jash    Milena  05311512
Jash    Milena  05311514
Sailor  Zac 06011509
Sailor  Zac 06011510
Sailor  Zac 06011511

What I want to do is arrange the data such that each subject is associated with a corresponding list of comma-separated sample ids in one row:

Surname Name    Idsample
Jash    Milena  05311511,05311512,05311514
Sailor  Zac 06011509,06011510,06011511

but i am having trouble with that. I can use only R. Thanks

jwezorek
  • 8,592
  • 1
  • 29
  • 46

1 Answers1

2

Here is a way using the tidyverse set of packages:

library(tidyverse)

sample_data <-tribble(~Surname, ~Name, ~Idsample,
                      "Jash",    "Milena",  "05311511",
                      "Jash",    "Milena",  "05311512",
                      "Jash",    "Milena",  "05311514",
                      "Sailor",  "Zac", "06011509",
                      "Sailor",  "Zac", "06011510",
                      "Sailor",  "Zac", "06011511")
sample_data_transformed <-
   sample_data %>% group_by(Surname,Name) %>% 
   summarise(Idsample = paste0(Idsample,collapse = ",")) %>% 
   ungroup()

jludewig
  • 428
  • 2
  • 8
  • Thanks a lot for your suggestion it works well, but if I want to do the same operation of summarise on a third column? so to do the same summarize on 2 or 3 columns not just one? – MicheleQuail Aug 13 '19 at 14:08
  • you can either add each column individually to the summarise statement like `summarise(Idsample1 = paste0(Idsample1,collapse = ","),Idsample2 = paste0(Idsample2,collapse = ","))` or if you want to do this for all the columns except the ones you group by you can use `summarise_all(.funs = list(~ paste0(.,collapse = ","))` – jludewig Aug 13 '19 at 14:53