2

I am trying to find a way to retrieve the coding sequence (CDS) of a specific gene of interest and load it into R. I tried my luck with the BioMart package, but it doesn't let me specify which gene I am interested in.

Any help is appreciated!

Best, Heiko

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Heiko
  • 65
  • 4

1 Answers1

3

This should work:

library(biomaRt)
library(Biostrings)
mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
cds_seq = getSequence(id = "APOE", 
                   type = "hgnc_symbol", 
                   seqType = "cdna", 
                   mart = mart)

We can translate the CDS:

AAs = sapply(cds_seq$coding,function(i)if(i=="Sequence unavailable"){NA}else{translate(DNAString(i))})

Get peptide sequence:

pep_seq = getSequence(id = "APOE", 
                   type = "hgnc_symbol", 
                   seqType = "peptide", 
                   mart = mart)

And check they are similar:

lapply(which(pep_seq$peptide!="Sequence unavailable"),function(i){
pep_seq$peptide[i] == as.character(AAs[[i]])
})

[[1]]
[1] TRUE

[[2]]
[1] TRUE

[[3]]
[1] TRUE

[[4]]
[1] TRUE

If you want to get the refseq, do:

cds_seq = getSequence(id = "NM_000041", 
                      type = "refseq_mrna", 
                      seqType = "coding", 
                      mart = mart)
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thank you so much for your help! If I run mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl") cds_seq = getSequence(id = "APOE", type = "hgnc_symbol", seqType = "cdna", mart = mart) it gives me a list with several cDNAs in it... Is there a way to specify and only load one specific refseq transcript? – Heiko Mar 11 '20 at 12:16
  • @Heiko, you have to know the refseq mrna id, see above edited answer – StupidWolf Mar 11 '20 at 12:44
  • cool, glad you got it going. If you feel that the answer has solved your problem, you can consider accepting it. https://stackoverflow.com/help/someone-answers . Have a great day :) – StupidWolf Mar 17 '20 at 09:23
  • Thanks again, StupidWolf! – Heiko Mar 20 '20 at 16:41
  • Dear StupidWolf, Sometimes I get an error saying: Error in .processResults(postRes, mart = mart, sep = sep, fullXmlQuery = fullXmlQuery, : Query ERROR: caught BioMart::Exception::Database: Error during query execution: Got error 28 from storage engine Do you have any idea what that means? I does not produce my cds_seq file then... – Heiko Mar 31 '20 at 11:20
  • are you outta disk space? https://stackoverflow.com/questions/10631387/1030-got-error-28-from-storage-engine – StupidWolf Mar 31 '20 at 11:22
  • 1
    Hey StupidWolf, I was able to solve the issue by changing the host... Thanks though! – Heiko Apr 06 '20 at 13:45