-2

I am trying to filter genes from a large data set and when I run this the output is fewer genes than I entered. How can I get all the genes I am filtering to appear in my data frame?

df < -read.csv("05.15.19.MT_AdultVInfant.wadjp.csv", header = TRUE, row.names = NULL)
data < -as.data.frame(df % > %
    filter(gene == 'FMO3' | gene == 'MRC2' | gene == 'GPRC5A' | gene == 'ATP1A2' | gene == 'RRAGD' | gene == 'LZTS1' | gene == 'EML1' | gene == 'SYT1' | gene == 'MGAT4A' | gene == 'TEAD2' | gene == 'BRINP1' | gene == 'PLOD1' | gene == 'IRAK3' | gene == 'UNC13D' | gene == 'KCNK10' | gene == 'DOK5' | gene == 'PLCB4' | gene == 'CACNA1F' | gene == 'PTN' | gene == '10orf54' | gene == 'CYP2C18' | gene == 'CPD' | gene == 'ALDH3A1' | gene == 'CHPT1'))

There are 24 genes here but the output has only 3. For this example, I expect to see all 24 genes as they are all present in the original data set.

user3837868
  • 917
  • 1
  • 12
  • 24
Noah_Seagull
  • 337
  • 5
  • 18
  • Without being able to see your data, the best anyone can really do is guess. [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making a reproducible example – camille May 24 '19 at 17:54

1 Answers1

0

We can create a vector of gene names and use %in% to filter

library(dplyr)
out <- df1 %>%
          filter(gene %in% v1)

The tbl_df prints only few rows. So, can either convert to data.frame

data.frame(out)

or change the print options (tibble.print_max) for tibble to show more rows

where

v1 <- c('FMO3', 'MRC2', 'GPRC5A', ...)
akrun
  • 874,273
  • 37
  • 540
  • 662