0

I have a big dataframe (60,000+ rows). I want to create a new dataframe from extracting 10 of the rows which have an exact string match to strings in another dataframe I have. How can I do this in an 'R' way?

The first 5 rows of the big dataframe (saponaria_mean_TPM_gene):

> Saponaria_mean_TPM_gene
# A tibble: 445,547 x 7
   GeneID               Flower Flower_bud Old_leaf     Root     Stem Young_leaf
   <chr>                 <dbl>      <dbl>    <dbl>    <dbl>    <dbl>      <dbl>
 1 TRINITY_DN0_c0_g1  612.       1202.    2282.    5645.    3645.      1740.   
 2 TRINITY_DN1_c0_g1   11.2        10.0     63.6     56.8     18.5       26.7  
 3 TRINITY_DN1_c1_g1    0.0306      0.161    0.719    0.984    5.44       0.174
 4 TRINITY_DN1_c2_g1    0.462       0.641    0.799    0.640    1.23       0.595
 5 TRINITY_DN1_c4_g1    0.327       0.140    1.13     2.43     1.80       1.54 

The strings I want to match to (dataframe coex_genes):

1                                                 TRINITY_DN10031_c1_g1
2                                                 TRINITY_DN10042_c0_g1
3                                                 TRINITY_DN10042_c0_g3
4                                                 TRINITY_DN10048_c0_g1
5                                                 TRINITY_DN10058_c0_g1
6                                                 TRINITY_DN10067_c5_g1
7                                                TRINITY_DN100732_c0_g1
8                                                TRINITY_DN100752_c0_g1
9                                                 TRINITY_DN10093_c1_g5
10                                               TRINITY_DN100979_c0_g1

So for example: the row for TRINITY_DN10031_c1_g1 should be

GeneID                Flower Flower_bud Old_leaf  Root  Stem Young_leaf
  <chr>                  <dbl>      <dbl>    <dbl> <dbl> <dbl>      <dbl>
1 TRINITY_DN10031_c1_g1   1.78       2.08        0 0.226 0.544          0

I can get this manually using the code

gene1 <- filter(Saponaria_mean_TPM_gene, (GeneID == "TRINITY_DN10031_c1_g1"))

How can I write a loop (if that's sensible) or something else to find and create a dataframe of the 10 genes in coex_genes?

glitterbox
  • 31
  • 5

0 Answers0