1

I am trying to process some data coming back from my database in parallel.

I run a query and get a resultset.

results = dbsession.query(table1, table2).outerjoin(table1.fk_id).limit(10).all()

I see all of my data and fields in PyCharms debugger and object viewer. One of these fields is an XML object (XML field in MSSQL). I found code to make a custom type in python (code below).

raw_xml element

I then map that list and send it to my multiprocessing function via map.

pool = mp.Pool(processes=4)
mp_process_results = pool.map(partial(test_mp), results)

My multiprocessing function

def test_mp(resultset):
    processed_file = Generic_translator(resultset).translate_record()
    return processed_file

Once the object gets to the my test_mp() function the raw_xml data is no longer in the raw_xml field. raw_xml disappeared. The other fields (e.g. id, filename, basename, filesize, etc.) are all present and available. Unfortunately, I need the raw_xml field.

I assume that this is due to the fact that the processed don't share the same memory space. However, I'm not sure how I can make that data available to the other processes.

import sqlalchemy.types as types
import lxml
import logging

from lxml import etree

class XMLType(types.UserDefinedType):

    def get_col_spec(self):
        return 'XML'

    def bind_processor(self, dialect):
        def process(value):
            if value is not None:
                if isinstance(value, str):
                    return value
                else:
                    return etree.tostring(value)
            else:
                return None
        return process

    def result_processor(self, dialect, coltype):
        def process(value):
            if (value is not None) and (value is not '') :
                try:
                    value = etree.fromstring(value)
                except lxml.etree.XMLSyntaxError:
                    logging.error("Syntax error in XML file: %s", value)
                    logging.error("XML result is probably NULL")
                    logging.error("XMLSyntaxError: %s", lxml.etree.XMLSyntaxError)
            return value
        return process
Kevin Vasko
  • 1,561
  • 3
  • 22
  • 45

0 Answers0