Submitting queries in parallel using pool.map within SQLAlchemy

Question

I have query which I would like to submit in parallel using the multiprocessing.Pool module.

My question is how I can submit the query along with the corr. binding variables to pool.map ?

So if the query is something like:

SELECT a.col1, a.col2, t.col1 FROM table a, 
 (VALUES(:x1,:x2),(:y1,:y2),(:z1,:z2)) 
 AS t(col1,col2) WHERE a.col2=col2 AND a.col1 % t.col1

and I dict of binding variables with each query:

{x1 : 'AA', x2:'XX',y1:'BB', y2:'YY',z1:'CC',z2:'ZZ'}

How can I invoke pool.map(..) on this ? I ask this because pool.map() only supports one-argument worker functions

Now, if I WASN'T using binding variables, I would invoke it like this:

def execSQL():
    #init db_user, db_host, sql_qry here
    ...
    conn = psycopg2.connect("dbname=%s host=%s user=%s password=%s" % (db_name, db_host, db_user, db_pwd))
    curs = conn.cursor()
    curs.execute(sql_qry)
    records = curs.fetchall()
    return records 

def run():
    sql_queries = [...] # some list of queries to be run in parallel
    pool = Pool(processes=4)
    results=pool.map(execSQL, sql_queries)

My questions is this, for the multiprocessing module, how do I invoke pool.map(..) on a query for which I have to supply binding variables? I ask this because Pool.map() and friends only support one-argument worker functions — femibyte, Jun 29 '16 at 02:33
Again, what are you trying to accomplish? Why are you trying to do this? If the query is IO-bound, multiprocessing won't save you; if the query is CPU-bound, it's bound by the DB machine, multiprocessing won't save you either. — univerio, Jun 29 '16 at 06:25
The query is CPU bound. My problem is that the query is slow because the number of binding variable pairs is extremely large ~500. So assuming that I am on a 4-CPU machine, and I split up the query into 4 queries of 125 binding variable pairs each, why shouldn't it work ? See this thread for an example - http://stackoverflow.com/questions/23157909/how-to-use-threads-within-a-flask-route — femibyte, Jun 29 '16 at 06:33
Where do these binding variables come from? Also, your list of binding variables is not valid python code — Eric, Jun 29 '16 at 15:19
Ok, I've corrected it. It's a dict of binding variables. The binding variables are the actual string values that will be substituted into the VALUES clause of the sql. This is postgresql. — femibyte, Jun 29 '16 at 18:35
500 params is not a large number at all, certainly not enough to make it CPU-bound, IMO, unless you are performing some ungodly amount of computation in your `WHERE` clause (although it wouldn't hurt to test whether parallelism *does* help, but I would wager "no"). It would help us solve your problem if, instead of describing to us [Y](http://xyproblem.info/), your perceived solution, you could give us some details on [X](http://xyproblem.info/), your original problem. — univerio, Jun 29 '16 at 20:13
OTOH, if you just want to get on with your day and solve Y, you can "pre-bind" the query using [`.bindparams()`](http://docs.sqlalchemy.org/en/latest/core/sqlelement.html#sqlalchemy.sql.expression.TextClause.bindparams). — univerio, Jun 29 '16 at 20:13
For X, take a look at this : http://stackoverflow.com/questions/37975897/maximum-number-of-bind-parameters-in-sqlalchemy It seems like I may find a solution here : http://stackoverflow.com/questions/25553919/passing-multiple-parameters-to-pool-map-function-in-python — femibyte, Jun 29 '16 at 20:53
Have you done an `EXPLAIN ANALYZE` to see what the query plan is? — univerio, Jun 29 '16 at 21:17

Submitting queries in parallel using pool.map within SQLAlchemy

0 Answers0