4

I intend to use "foreach" to uitlize all the cores in my CPU. The catch is i need to send a sql query inside the loop. The script is working fine with normal 'for' loop, but it is giving following error when i change it to 'foreach'. The error is :

select: Interrupted system call    
select: Interrupted system call    
select: Interrupted system call    
Error in { : task 1 failed - "expired MySQLConnection"

The code i used is :

library(foreach)
library(doMC)
library(RMySQL)
library(multicore)
registerDoMC(cores=6)
m <- dbDriver("MySQL", max.con = 100)
con <- dbConnect(m, user="*****", password = "******", host ="**.**.***",dbname="dbname")
list<-dbListTables(con)
foreach(i = 1:(length(list))%dopar%{
  query<-paste("SELECT * FROM ",list[i]," WHERE `CLOSE` BETWEEN 1 AND 100",sep="")
  t<-dbGetQuery(con,query)
}

Though 'foreach' is working fine in my system for all other purposes, it is giving error only in case of sql queries. Is there a way to send sql queries inside 'foreach' loop?

rogaldh
  • 268
  • 1
  • 9
rajesh
  • 41
  • 4
  • 2
    This comment is peripheral to your question: Using four cores to spit queries at one database won't make the database itself any faster. If the database is the bottleneck, finding a faster (parallel) way to fire queries at it isn't going to get your results any faster. Maybe you've already done this, but you ought to find a way of determining if the database will keep up before wasting time threading queries. – DavidO Jun 20 '11 at 06:39
  • David0 is right. Using multiple cores speeds up processor bound tasks but database querying seldom fits the bill. – David Heffernan Jun 20 '11 at 06:56
  • Thanks for your comments. But, i won't be using 'foreach' looping construct to speed up database queries. I will be needing multiple cores to speed up the arithmetic on the result of the queries. I didn't include that code above, because, 'foreach' is failing at SQL query itself. – rajesh Jun 20 '11 at 08:31
  • 4
    @rajesh -- it looks like you're using the same db connection for all queries. Should you open and close a unique connection within the loop? – Richard Herron Jun 20 '11 at 09:13
  • thanks richard. It worked. Apparently, both driver and connection need to be initiated inside the loop. I am still getting the warning message as mentioned above "select : Interrupted system call" without any error message. If it has something to do with kernel, FYI: I am running the script in Ubuntu 10.04 – rajesh Jun 20 '11 at 16:31
  • @rajesh -- I'm not sure what that means. Maybe update the question to show the new problem? – Richard Herron Jun 21 '11 at 15:32

1 Answers1

0

My suggestion is this: Move the database queries outside the loop, and lock access so you dont do parallel database queries. I think that will speed things up too, as you won't have parallel disk access, while still being able to do parallel processing.

Meaning (pseudo code) db = connect to database threadlock = lock();

parfor { threadlock.lock result = db query (pull all data here, as you cant process while you load without keeping the database locked) thread.unlock process resulting data (which is now just data, and not a sql object). }

neuron
  • 1,224
  • 12
  • 14