0

I'm very new to R and need help excluding a list of contaminating genes from my transcriptome.

For example,

I have a file called genes.txt that contain a column of gene names along with a column of corresponding gene sequences:

Name Sequence

Cluster1 TACGATCGATCGATCG.....

Cluster2 ATCGATCGATCGATCG.....

etc...

I have another file called contam.txt that is a list of gene names that need to be excluded from my master gene list:

Name

Cluster1

Cluster5

etc...

I need to eliminate the entire row in the gene file, corresponding to the clusters in the contam file. THis is the code I'm trying:

#set working directory
setwd("C:/MyR/transcriptome")

#using data.table
library(data.table)

#load gene file
gene <- as.data.table(read.table("gene.txt",stringsAsFactors=FALSE, 
header=TRUE))

#set the key
setkey(unigene, Name)

#load contamination file
contam <- as.data.table(read.table("contam.txt",stringsAsFactors=FALSE, 
header=TRUE))

#remove contaminants from unigene file
unigene_new <- unigene[!unigene$Name %in% contam,]
#or
unigene_new[-unigene[contam, which=TRUE]]

my code does not remove the unwanted genes from my list.... anyone know what I'm doing wrong?

mor-T123
  • 1
  • 1
  • try an `anti-join`. If your `contam` table has a column called `Name` it will be something like `gene[ !contam, on = "Name"]` – SymbolixAU Jan 03 '18 at 23:22
  • see [this question](https://stackoverflow.com/q/33666971/5977215) for examples of how to write a useful question on SO (i.e., with example data that others can use), and also the answer which applies to you too. And also [this answer](https://stackoverflow.com/a/28703077/5977215) which gives a couple of `data.table` solutions – SymbolixAU Jan 03 '18 at 23:28
  • the "gene[ !contam, on = "Name"]" code worked! thank you so much!!! – mor-T123 Jan 04 '18 at 00:14
  • You're welcome. I recommend going through the other questions and answers I've linked to as well - they will help your understanding. – SymbolixAU Jan 04 '18 at 00:18

0 Answers0