2

I have a data.frame that has two columns only. one is barcodeid and the other is gene.

barcodeid gene
M001-M008-S137 IL12RB1
M001-M008-S137 IL7RA
M001-M008-S137 LMP1
M001-M012-S080 CRLF2
M001-M012-S080 ICOS
M001-M012-S080 IL7RA

I want to end up with this table:

barcodeID geneSequence
M001-M008-S137 IL12RB1-IL7RA-LMP1
M001-M012-S080 CRLF2-ICOS-IL7RA

I've looked up reshape, dcast, spread, gather in r and as far as I can tell these are not the functions that would allow me to do this. Appreciate any help!

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
Beeba
  • 642
  • 1
  • 7
  • 18
  • 1
    Thanks Frank, this worked as well. I'm doing it this way purely for aesthetic reasons because this is how people want to see the data presented in a table. Thanks a lot! – Beeba Apr 20 '18 at 15:34

3 Answers3

2

Assume df is your data.frame and a combination of R base functions would be helpful:

> x <- lapply(split(df$gene, df$barcodeid), paste0, collapse="-")
> data.frame(barcodeid=names(x), geneSequence=unlist(x), row.names = NULL)
       barcodeid       geneSequence
1 M001-M008-S137 IL12RB1-IL7RA-LMP1
2 M001-M012-S080   CRLF2-ICOS-IL7RA
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
1

With dplyr you could do:

df %>% 
  group_by(barcodeid) %>% 
  mutate(geneSequence = paste(gene, collapse = "-")) %>%
  select(-gene) %>% 
  slice(1)


# A tibble: 2 x 2
# Groups:   barcodeid [2]
   barcodeid       geneSequence
      <fctr>              <chr>
1 M001-M008-S137 IL12RB1-IL7RA-LMP1
2 M001-M012-S080   CRLF2-ICOS-IL7RA
Lennyy
  • 5,932
  • 2
  • 10
  • 23
1

Some more options:

reshape2::dcast(DT, barcodeid ~ ., paste, collapse="-")

aggregate(. ~ barcodeid, DT, paste, collapse="-")

aggregate has the benefit of auto-naming as "gene" instead of "." here, though if a new name is needed, I guess they're interchangeable, followed by...

names(res)[2] <- "geneSequence"

To revert the change, one approach is:

splitstackshape::cSplit(res, "geneSequence", "-", direction = "long")

See Split comma-separated column into separate rows for many more options.

Frank
  • 66,179
  • 8
  • 96
  • 180
  • if I wanted to make my first table again how could I do that? How can I split the geneSequence into three rows with the same barcodeid but one gene in each row? – Beeba Apr 20 '18 at 16:19
  • @HibaShaban I've edited in one way to approach it with a link to more. I'm not sure what the tidyr or base R ways would be – Frank Apr 20 '18 at 16:25