The following class Downloader
is supposed to query a SQL database multiple times and store the results in a list of pandas.DataFrame
objects.
I would like to use multiprocessing
to speed up the retrieval, however I get the error
line 53, in run_queries
dfs_queries = p.map(run_query, queries)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
I have looked into this question which suggests that the pyodbc connection and cursor objects can not be pickled. Is there a way to still use the pool.map(f, arglist)
from multiprocessing
when f
is reliant on a sql connection?
import pyodbc
from multiprocessing import Pool as ThreadPool
import pandas as pd
class Downloader(object):
def _connect(self, path_db_config):
# ... Loads a config file from which it gets dsn, user and password ... #
con_string = 'DSN=%s;UID=%s;PWD=%s;' % (dsn, user, password)
return pyodbc.connect(con_string)
def run_queries(self):
queries = [# List of sql queries #]
p = ThreadPool(len(queries))
def run_query(query):
cnxn = self._connect(PATH_DB_CONFIG)
df = pd.read_sql(query, cnxn)
return df
return p.map(run_query, queries)
Thanks for the help!!