1

I'm trying to access the NCBI SRA database, query it for a list of IDs and save the output to a matrix.

I'm using the sradb package from Bioconductor to do this and now I can access and query the database, but its really slow and I couldn't quite figure out how to save the loop output.

The file GPL11154_GSMs.txt contains the IDs I'm interested in. And it looks like this:

GSM616127
GSM616128
GSM616129
GSM663427
GSM665037

What I have now updates the result on every iteration.

#source("https://bioconductor.org/biocLite.R")
#biocLite("SRAdb")
library(SRAdb)

#connect to databasse
sqlfile <- getSRAdbFile()
sra_con <- dbConnect(SQLite(),sqlfile)


## lists all the tables in the SQLite database
sra_tables <- dbListTables(sra_con)
sra_tables


dbGetQuery(sra_con,'PRAGMA TABLE_INFO(study)')

## checking the structure of the tables
#dbListFields(sra_con,"experiment")
#dbListFields(sra_con,"run")



#read in file with sample IDs per platform
x <- scan("GPL11154_GSMs.txt", what="", sep="\n")
gsm_list <- strsplit(x, "[[:space:]]+")  # Separate elements by one or more whitepace
for (gsm in gsm_list){
  gsm_to_srr <- getSRA(search_terms = gsm, out_types = c("submission", "study", "sample","experiment", "run"), sra_con)
  print(gsm_to_srr)
  }
zx8754
  • 52,746
  • 12
  • 114
  • 209
MenieM
  • 11
  • 1

1 Answers1

0

Using lapply instead of forloop, try:

res <- lapply(gsm_list, function(gsm){
  getSRA(search_terms = gsm,
         out_types = c("submission", "study",
                       "sample","experiment", "run"),
         sra_con) })

From the manuals, getSRAshould return a data.frame, so res object will have list of data.frames. If we need to convert list of data.frames into one data.frame, this post on how do it.

Community
  • 1
  • 1
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • Yeah, I converted res.df = as.data.frame(do.call(rbind, res)) and then saved. It worked perfectly. Thanks – MenieM Nov 10 '16 at 13:12