I have a big dataframe (60,000+ rows). I want to create a new dataframe from extracting 10 of the rows which have an exact string match to strings in another dataframe I have. How can I do this in an 'R' way?
The first 5 rows of the big dataframe (saponaria_mean_TPM_gene):
> Saponaria_mean_TPM_gene
# A tibble: 445,547 x 7
GeneID Flower Flower_bud Old_leaf Root Stem Young_leaf
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 TRINITY_DN0_c0_g1 612. 1202. 2282. 5645. 3645. 1740.
2 TRINITY_DN1_c0_g1 11.2 10.0 63.6 56.8 18.5 26.7
3 TRINITY_DN1_c1_g1 0.0306 0.161 0.719 0.984 5.44 0.174
4 TRINITY_DN1_c2_g1 0.462 0.641 0.799 0.640 1.23 0.595
5 TRINITY_DN1_c4_g1 0.327 0.140 1.13 2.43 1.80 1.54
The strings I want to match to (dataframe coex_genes):
1 TRINITY_DN10031_c1_g1
2 TRINITY_DN10042_c0_g1
3 TRINITY_DN10042_c0_g3
4 TRINITY_DN10048_c0_g1
5 TRINITY_DN10058_c0_g1
6 TRINITY_DN10067_c5_g1
7 TRINITY_DN100732_c0_g1
8 TRINITY_DN100752_c0_g1
9 TRINITY_DN10093_c1_g5
10 TRINITY_DN100979_c0_g1
So for example: the row for TRINITY_DN10031_c1_g1
should be
GeneID Flower Flower_bud Old_leaf Root Stem Young_leaf
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 TRINITY_DN10031_c1_g1 1.78 2.08 0 0.226 0.544 0
I can get this manually using the code
gene1 <- filter(Saponaria_mean_TPM_gene, (GeneID == "TRINITY_DN10031_c1_g1"))
How can I write a loop (if that's sensible) or something else to find and create a dataframe of the 10 genes in coex_genes?