ElementTree parallel node creation

Question

I'm writing a graph object to an xml representation. My monolithic code works well, but it's too slow on my large graph. I'm trying to parallelize it, but I'm not getting the SubElement back from the pool. I'm sure that I'm missing something obvious, but I'm new to python.

import networkx as nx
import lxml.etree as et
from multiprocessing import Pool

G = nx.petersen_graph()

# For any graph, make a node subelement with the id being the node label
def getNodeAttributes(index):
    et.SubElement(nodes, "node", attrib={'id': str(G.nodes()[index])})

# Do it with one monolithic process
network = et.Element("network", attrib={"name": "Petersen Graph"})
nodes = et.SubElement(network, "nodes")

for i in range(len(G)):
    getNodeAttributes(i)

et.dump(network)

<network name="Petersen Graph">
  <nodes>
    <node id="0"/>
    <node id="1"/>
    <node id="2"/>
    <node id="3"/>
    <node id="4"/>
    <node id="5"/>
    <node id="6"/>
    <node id="7"/>
    <node id="8"/>
    <node id="9"/>
  </nodes>
</network>

# Do it again, but with pool.map in parallel
network = et.Element("network", attrib={"name": "Petersen Graph"})
nodes = et.SubElement(network, "nodes")

pool = Pool(4)
pool.map(getNodeAttributes, range(len(G)))
pool.close()
pool.join()

et.dump(network)

<network name="Petersen Graph">
  <nodes/>
</network>

well you're not doing anything with the return value from `pool.map`, so...? — roippi, Sep 18 '14 at 16:33
`SubElement` is supposed to create a pointer to `nodes`. There's no value to return, at least that I can find. — gregmacfarlane, Sep 18 '14 at 16:39
Oh, I see. You should understand that `multiprocessing` involves forking *new processes* to do work - i.e. every process gets a *copy* of everything to work on. Any work that they do is lost if you don't somehow return it to your main thread. — roippi, Sep 18 '14 at 16:41

score 1 · Accepted Answer · edited May 23 '17 at 12:14

1

Use a queue (multiprocessing.Queue) to collect the results of your worker processes. See the answer to this question: Sharing a result queue among several processes.

That said, I'm not sure it will help much in your case, since the XML file needs to be read and parsed sequentially, and the element tree is going to be quite large. But give it a try...

edited May 23 '17 at 12:14

Community

1
1

answered Sep 18 '14 at 17:24

alexis

48,685
16
101
161

Interesting. In my application the order of the XML parsing actually doesn't matter. – gregmacfarlane Sep 18 '14 at 17:26
I get that, but you still need to parse the file. But don't listen to me, try it out. – alexis Sep 18 '14 at 17:36
1

He doesn't seem to be parsing an XML file, but generating one. – Ross Ridge Sep 18 '14 at 17:48
Oops, I missed that! Thanks Ross. – alexis Sep 18 '14 at 17:50

ElementTree parallel node creation

1 Answers1