-2

I have a dataframe that has one column with multiple rows in it. Each rows has gene name separated with "|", need i to replace this "|" to new line.

df looks like:

  gene
1 EIF2S1|PLEK2
2 IGHV1-45|IGHV1-46|IGHV1-58|IGHV3-33|IGHV3-35|IGHV3-38|IGHV3-43|IGHV3-48|IGHV3-49|IGHV3-53|IGHV4-31|IGHV4-34|IGHV4-39|IGHV4-59|IGHV5-51
3 SERPINA1|SERPINA2

Desired output (need a character vector of list of genes in it):

EIF2S1
PLEK2
IGHV1-45
IGHV1-58
...next-gene
...next-gene
...
...
SERPINA2

Tried so far but not working:

gsub("^|", "", trimws(user_filt))
RKK
  • 31
  • 11
  • Do you want to split to a new row? Or do you just want to insert a line break character into the string? I'm not quite sure from your description. – MrFlick Oct 12 '21 at 03:35
  • Ans is bellow, thanks though ! – RKK Oct 12 '21 at 03:39

1 Answers1

0

Does this work:

library(dplyr)
library(tidyr)
df %>% separate_rows(gene, sep = '\\|')
# A tibble: 19 x 1
   gene    
   <chr>   
 1 EIF2S1  
 2 PLEK2   
 3 IGHV1-45
 4 IGHV1-46
 5 IGHV1-58
 6 IGHV3-33
 7 IGHV3-35
 8 IGHV3-38
 9 IGHV3-43
10 IGHV3-48
11 IGHV3-49
12 IGHV3-53
13 IGHV4-31
14 IGHV4-34
15 IGHV4-39
16 IGHV4-59
17 IGHV5-51
18 SERPINA1
19 SERPINA2

Data used:

df
                                                                                                                                    gene
1                                                                                                                           EIF2S1|PLEK2
2 IGHV1-45|IGHV1-46|IGHV1-58|IGHV3-33|IGHV3-35|IGHV3-38|IGHV3-43|IGHV3-48|IGHV3-49|IGHV3-53|IGHV4-31|IGHV4-34|IGHV4-39|IGHV4-59|IGHV5-51
3                                                                                                                      SERPINA1|SERPINA2
Karthik S
  • 11,348
  • 2
  • 11
  • 25