0

I'm writing a graph object to an xml representation. My monolithic code works well, but it's too slow on my large graph. I'm trying to parallelize it, but I'm not getting the SubElement back from the pool. I'm sure that I'm missing something obvious, but I'm new to python.

import networkx as nx
import lxml.etree as et
from multiprocessing import Pool

G = nx.petersen_graph()

# For any graph, make a node subelement with the id being the node label
def getNodeAttributes(index):
    et.SubElement(nodes, "node", attrib={'id': str(G.nodes()[index])})

# Do it with one monolithic process
network = et.Element("network", attrib={"name": "Petersen Graph"})
nodes = et.SubElement(network, "nodes")

for i in range(len(G)):
    getNodeAttributes(i)

et.dump(network)
<network name="Petersen Graph">
  <nodes>
    <node id="0"/>
    <node id="1"/>
    <node id="2"/>
    <node id="3"/>
    <node id="4"/>
    <node id="5"/>
    <node id="6"/>
    <node id="7"/>
    <node id="8"/>
    <node id="9"/>
  </nodes>
</network>
# Do it again, but with pool.map in parallel
network = et.Element("network", attrib={"name": "Petersen Graph"})
nodes = et.SubElement(network, "nodes")

pool = Pool(4)
pool.map(getNodeAttributes, range(len(G)))
pool.close()
pool.join()

et.dump(network)
<network name="Petersen Graph">
  <nodes/>
</network>
gregmacfarlane
  • 2,121
  • 3
  • 24
  • 53
  • well you're not doing anything with the return value from `pool.map`, so...? – roippi Sep 18 '14 at 16:33
  • `SubElement` is supposed to create a pointer to `nodes`. There's no value to return, at least that I can find. – gregmacfarlane Sep 18 '14 at 16:39
  • Oh, I see. You should understand that `multiprocessing` involves forking *new processes* to do work - i.e. every process gets a *copy* of everything to work on. Any work that they do is lost if you don't somehow return it to your main thread. – roippi Sep 18 '14 at 16:41
  • Yeah, this is what I'm trying to figure out. – gregmacfarlane Sep 18 '14 at 16:45

1 Answers1

1

Use a queue (multiprocessing.Queue) to collect the results of your worker processes. See the answer to this question: Sharing a result queue among several processes.

That said, I'm not sure it will help much in your case, since the XML file needs to be read and parsed sequentially, and the element tree is going to be quite large. But give it a try...

Community
  • 1
  • 1
alexis
  • 48,685
  • 16
  • 101
  • 161