1

I am looking to split up a string into separate rows in a data frame and I'm not sure what the best approach would be. If I have a data frame that looks like this:

Gene=c("Gene1","Gene2", "Gene3")
Alt=c("ABC, DEF", "XYZ", "ABC, XYZ")
df=data.frame(Gene, Alt)

And the goal would be to split out the string to separate rows in the database such that the data frame assigns the split to the next row in the data frame and looks like this:

Gene.b=c("Gene1", "Gene1", "Gene2", "Gene3", "Gene3")
Alt.b=c("ABC","DEF","XYZ","ABC","XYZ")
df2=data.frame(Gene.b, Alt.b)

I tried using different approaches with stringr but couldn't seem to split them in a way to assign correctly to the associated gene. I'm looking for an operation that is sort of the opposite of the toString() function.

user2900006
  • 427
  • 1
  • 4
  • 15

1 Answers1

2

We use separate_rows from tidyr and change the column names with rename_all

library(tidyr)
library(dplyr)
separate_rows(df, Alt) %>%
    rename_all(~ paste0(., '.b'))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    This is great. Thanks. There were some other weird characters in my df (namely '_') and it performed a greedy separation so I added a sep parameter (eg. separate_rows(df, Alt, sep=",") to get what I needed. – user2900006 Nov 13 '18 at 19:06