Say I have a program that looks like this:
from lxml import etree
class ParseXmlFile(object):
def __init__(self, xml_to_parse):
self.xml = etree.parse(xml_to_parse)
def a(self):
return self.xml.xpath('//something')
def b(self):
return self.xml.xpath('//something-else')
lxml frees the GIL, so it should be possible to run a
and b
concurrently in separate threads or processes.
From the lxml docs:
lxml frees the GIL (Python's global interpreter lock) internally when parsing from disk and memory...The global interpreter lock (GIL) in Python serializes access to the interpreter, so if the majority of your processing is done in Python code (walking trees, modifying elements, etc.), your gain will be close to zero. The more of your XML processing moves into lxml, however, the higher your gain. If your application is bound by XML parsing and serialisation, or by very selective XPath expressions and complex XSLTs, your speedup on multi-processor machines can be substantial.
I have done little to no work with multithreading.
Your run of the mill multiprocessing implementation would use something like multiprocessing.Pool().map()
, which seems to be of no use here since I have a list of functions and a single argument rather than a single function and a list of arguments. Attempting to wrap each function in another function and then multiprocess as described in one of the answers raises the following exception:
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
Is it possible to do what I'm describing? If so, how?