1

I am operating on chunks of panel data with about 3*10^6 rows and 11 columns. This data is stored in a SQL database (the data I use is a subset of an even larger dataset). I want to process chunks of the data (each date seperately) without importing the entire thing all at once, but the operations I have to run on each chunk are computationally intensive, so I would like to use the foreach package.

However, I am running into some problems with running the sql call in parallel. If I do a simple call such as (sqlQuery(channel, "select 1")) I can run that in parallel. If I do a more complex statement I get SIGPIPE errors. Has anyone else run into similar problems?

joran
  • 169,992
  • 32
  • 429
  • 468
rlh2
  • 1,039
  • 2
  • 9
  • 16

1 Answers1

1

You did not tell us what database you were using. On a SQLServer connected with RODBC, if have successfully done this by using transactions.

channel = odbcConnect(database)
odbcSetAutoCommit(channel,FALSE)
# Make requests here
odbcEndTran(channel,TRUE) 

I am not sure if this will also help when you have only read request as in your case, but it might be worth giving a try because it is only twp additional lines. I asked about the server, because transactions work nicely with MSSqlServer under ODBC, and I had some mixed results with others (but did not try hard).

Dieter Menne
  • 10,076
  • 44
  • 67
  • Sorry about that, it is MSFT SQLServer. Your results are encouraging, What exactly do you mean by "using transactions?" – rlh2 Nov 23 '11 at 16:53