I am trying to process some data coming back from my database in parallel.
I run a query and get a resultset.
results = dbsession.query(table1, table2).outerjoin(table1.fk_id).limit(10).all()
I see all of my data and fields in PyCharms debugger and object viewer. One of these fields is an XML object (XML field in MSSQL). I found code to make a custom type in python (code below).
I then map that list and send it to my multiprocessing function via map.
pool = mp.Pool(processes=4)
mp_process_results = pool.map(partial(test_mp), results)
My multiprocessing function
def test_mp(resultset):
processed_file = Generic_translator(resultset).translate_record()
return processed_file
Once the object gets to the my test_mp() function the raw_xml data is no longer in the raw_xml field. . The other fields (e.g. id, filename, basename, filesize, etc.) are all present and available. Unfortunately, I need the raw_xml field.
I assume that this is due to the fact that the processed don't share the same memory space. However, I'm not sure how I can make that data available to the other processes.
import sqlalchemy.types as types
import lxml
import logging
from lxml import etree
class XMLType(types.UserDefinedType):
def get_col_spec(self):
return 'XML'
def bind_processor(self, dialect):
def process(value):
if value is not None:
if isinstance(value, str):
return value
else:
return etree.tostring(value)
else:
return None
return process
def result_processor(self, dialect, coltype):
def process(value):
if (value is not None) and (value is not '') :
try:
value = etree.fromstring(value)
except lxml.etree.XMLSyntaxError:
logging.error("Syntax error in XML file: %s", value)
logging.error("XML result is probably NULL")
logging.error("XMLSyntaxError: %s", lxml.etree.XMLSyntaxError)
return value
return process