-1

I have a list of transcription factors and protein to which they assumed to be attached as below

[![enter image description here][1]][1]

I want to two columns data, the first column is the TFs and in the second column one individual protein per cell like below and as you see each TF is being repeated in the first column

Any help with getting this?

Angel
  • 151
  • 1
  • 11

2 Answers2

2

Does this work:

library(tidyr)
library(dplyr)

df %>% separate_rows(Proteins, sep = ';')
# A tibble: 15 x 2
   TFs   Proteins
   <chr> <chr>   
 1 HNF4A HNF4A   
 2 HNF4A SUB1    
 3 E2F1  RB1     
 4 E2F1  E2F1    
 5 E2F1  E2F1    
 6 E2F1  TFDP1   
 7 E2F1  GABPB2  
 8 E2F1  CCNA2   
 9 E2F1  RBL1    
10 E2F1  E2F1    
11 E2F1  RB1     
12 E2F1  E2F1    
13 E2F1  CEBPE   
14 E2F1  E2F1    
15 E2F1  TFDP1   

Data used:

df
    TFs          Proteins
1 HNF4A        HNF4A;SUB1
2  E2F1          RB1;E2F1
3  E2F1 E2F1;TFDP1;GABPB2
4  E2F1   CCNA2;RBL1;E2F1
5  E2F1    RB1;E2F1;CEBPE
6  E2F1        E2F1;TFDP1
Karthik S
  • 11,348
  • 2
  • 11
  • 25
2
  library(tidyverse)

 df %>% separate_rows(Proteins, sep = ";")

# A tibble: 15 x 2
   TFs   Proteins
   <chr> <chr>   
 1 HNF4A HNF4A   
 2 HNF4A SUB1    
 3 E2F1  RB1     
 4 E2F1  E2F1    
 5 E2F1  E2F1    
 6 E2F1  TFDP1   
 7 E2F1  GABPB2  
 8 E2F1  CCNA2   
 9 E2F1  RBL1    
10 E2F1  E2F1    
11 E2F1  RB1     
12 E2F1  E2F1    
13 E2F1  CEBPE   
14 E2F1  E2F1    
15 E2F1  TFDP1 

data.table

library(data.table)
setDT(df)[, list(Proteins = unlist(strsplit(Proteins, split = ";"))), by = TFs]

     TFs Proteins
 1: HNF4A    HNF4A
 2: HNF4A     SUB1
 3:  E2F1      RB1
 4:  E2F1     E2F1
 5:  E2F1     E2F1
 6:  E2F1    TFDP1
 7:  E2F1   GABPB2
 8:  E2F1    CCNA2
 9:  E2F1     RBL1
10:  E2F1     E2F1
11:  E2F1      RB1
12:  E2F1     E2F1
13:  E2F1    CEBPE
14:  E2F1     E2F1
15:  E2F1    TFDP1
Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14