5

I am trying to get a FASTA file form NCBI website, I use the following function

getncbiseq <- function(accession){
  dbs <- c()
  for (i in 1:numdbs){
    db <- dbs[i]
    choosebank(db)
    resquery <- try(query(".tmpquery", paste("AC=", accession)),silent = TRUE)
    if (!(inherits(resquery, "try-error"))){
      queryname <- "query2"
      thequery <- paste("AC=",accession,sep="")
      query(`queryname`,`thequery`)
      # see if a sequence was retrieved:
      seq <- getSequence(query2$req[[1]])
      closebank()
      return(seq)
    }
    closebank()
  }
  print(paste("ERROR: accession",accession,"was not found"))
}    

When I try to retrieve the sequence

mydata <- getncbiseq("NC_001477")

Error in getSequence(query2$req[[1]]) : object 'query2' not found

Is there a better way to shorten these loop function also ?

if I use

query('queryname','the query')
#or 
query("queryname","thequery")

I get another error

Error in query("queryname", "thequery") : invalid request:"unknown list at (^): \"(^)thequery\""

zx8754
  • 52,746
  • 12
  • 114
  • 209
nik
  • 2,500
  • 5
  • 21
  • 48

2 Answers2

3

Thanks for great help. I was stuck at this point for a full day. I finally got the following code worked under windows 10 with R3.4.0(32bits):-

getncbiseq <- function(accession)
{
require("seqinr") # this function requires the SeqinR R package
# first find which ACNUC database the accession is stored in:
dbs <- c("genbank","refseq","refseqViruses","bacterial")
numdbs <- length(dbs)
for (i in 1:numdbs)
{
db <- dbs[i]
choosebank(db)
# check if the sequence is in ACNUC database 'db':
resquery <- try(query(".tmpquery", paste("AC=", accession)), silent = TRUE)

if (!(inherits(resquery, "try-error"))) {
  queryname <- "query2"
  thequery <- paste("AC=", accession, sep="")
  query2 <- query(queryname, thequery)
  # see if a sequence was retrieved:
  seq <- getSequence(query2$req[[1]])
  closebank()
  return(seq)
}
closebank()
}
print(paste("ERROR: accession",accession,"was not found"))
}
2

I think you intended to assign your call to query() to a variable called query2, but you forgot to do it. Try this:

if (!(inherits(resquery, "try-error"))) {
  queryname <- "query2"
  thequery <- paste("AC=", accession, sep="")
  query2 <- query(queryname, thequery)
  # see if a sequence was retrieved:
  seq <- getSequence(query2$req[[1]])
  closebank()
  return(seq)
}

As you mentioned, the rest of your code also has some quirks and kinks which could probably be improved upon.

Update:

Here is a refactor of your code using sapply on the dbs vector instead of an explicit for loop (the latter which is usually frowned upon by R people):

processdbs <- function(x, y) {
    choosebank(x)
    resquery <- try(query(".tmpquery", paste("AC=", y)), silent = TRUE)
    if (!(inherits(resquery, "try-error"))) {
      queryname <- "query2"
      thequery  <- paste("AC=", y, sep="")
      query2 <- query(queryname, thequery)

      # see if a sequence was retrieved:
      seq <- getSequence(query2$req[[1]])
      closebank()
      return(seq)
    }
    closebank()
}

getncbiseq <- function(accession) {
   dbs <- c("genbank","refseq","refseqViruses","bacterial")
   result <- sapply(dbs, processdbs, y=accession)
   closebank()

   print(paste("ERROR: accession",accession,"was not found"))
}

You may have to do a slight amount of additional work to inspect the result vector and determine whether a sequence was retrieved anywhere.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • 1
    that is perfect Tim, it works, do you know by any chance a way to shorten the loop ? if no, no problem I liked and accepted your answer anyway – nik Jun 16 '16 at 10:15
  • I don't know if it can be shortened, but you could instead use an `apply()` function on the `dbs` vector, and eliminate the explicit loop completely. – Tim Biegeleisen Jun 16 '16 at 10:17