I've got a very large dataframe, data
(with > 200,000 rows), containing genomic positions for different genes. I want to extract all rows based on different genes and combine them into a new dataframe. For example, I want all rows for SSR1
and STK38
.
chrom txStart ExonCount geneSymbol
chr6 7281287 8 SSR1
chr6 7295624 8 SSR1
chr6 7298155 8 SSR1
chr6 31938951 8 STK19
chr6 31939645 8 STK19
chr6 31940397 8 STK19
chr6 36461668 14 STK38
chr6 36464487 14 STK38
chr6 36465556 14 STK38
chr6 125229391 7 STL
chr6 125241333 7 STL
chr6 125252841 7 STL
Of course, I could do this using the which
like below, and then combine them using rbind
, but that's too time consuming since I'll be having a lot of genes.
Gene1 <- data[which(data$geneSymbol=="SSR1"), ]
Gene2 <- data[which(data$geneSymbol=="STK38"), ]
I've tried a for
loop, but I'm not getting the right output.
genes1 <- 0
genes <- c("SSR1", "STK38")
for (i in genes) {
genes1 <- print(data[which(data$geneSymbol==i), ])
}
I want it too look like this:
chrom txStart ExonCount geneSymbol
chr6 7281287 8 SSR1
chr6 7295624 8 SSR1
chr6 7298155 8 SSR1
chr6 36461668 14 STK38
chr6 36464487 14 STK38
chr6 36465556 14 STK38
I'm sure that the solution is very easy, but I've looked all over the web for the past few days without finding a solution.