I'm very new to R and need help excluding a list of contaminating genes from my transcriptome.
For example,
I have a file called genes.txt that contain a column of gene names along with a column of corresponding gene sequences:
Name Sequence
Cluster1 TACGATCGATCGATCG.....
Cluster2 ATCGATCGATCGATCG.....
etc...
I have another file called contam.txt that is a list of gene names that need to be excluded from my master gene list:
Name
Cluster1
Cluster5
etc...
I need to eliminate the entire row in the gene file, corresponding to the clusters in the contam file. THis is the code I'm trying:
#set working directory
setwd("C:/MyR/transcriptome")
#using data.table
library(data.table)
#load gene file
gene <- as.data.table(read.table("gene.txt",stringsAsFactors=FALSE,
header=TRUE))
#set the key
setkey(unigene, Name)
#load contamination file
contam <- as.data.table(read.table("contam.txt",stringsAsFactors=FALSE,
header=TRUE))
#remove contaminants from unigene file
unigene_new <- unigene[!unigene$Name %in% contam,]
#or
unigene_new[-unigene[contam, which=TRUE]]
my code does not remove the unwanted genes from my list.... anyone know what I'm doing wrong?