3

I've been struggling for many days now with a class PublicationSaver() that I wrote that has a method for loading xml documents as strings (not shown here) and then it passes each loaded string to self.savePublication(self, publication, myDirPath).

Every time I have used it crashed after about 25.000 strings and it saves the last string on which it crashes, I was able parse that string separately so I suppose that the problem is not bad XML.

I asked here but no answers.

I goggled a lot and it seems that I'm not the only one having this problem: here

So, since I really need to complete this task, I thought this: can I wrap all with a Thread set in main, so that when lxml parse throws an exception I get it and send a result to main to kill the thread and start it again?

#threading
result_q = Queue.Queue()

# Create the thread 
xmlSplitter = XmlSplitter_Thread(result_q=result_q)
xmlSplitter.run(toSplit_DirPath, target_DirPath)

print "Hello !!!\n"

toSplitDirEmptyB=False

while not toSplitDirEmptyB:
  
    splitterAlive=True
    while splitterAlive:
        sleep(120)
        splitterAlive=result_q.get()
        
    xmlSplitter.join()
    print "*** KILLED XmlSplitter_Thread !!! ***\n"
    
    if not os.listdir(toSplit_DirPath):
        toSplitDirEmptyB=True
    else:
        xmlSplitter.run(toSplit_DirPath, target_DirPath)

Is this a valid approach ? When I run the code above at the moment is not working; I mean I never get the "Hello !!" displayed and the xmlSplitter just keep going even when it starts to fail (there's an exception rule that keeps it going).

Community
  • 1
  • 1
Marco Evasi
  • 441
  • 2
  • 14

1 Answers1

0

Probably the thread fails and its blocking on join method. take a look here . Split the xml into chunks and try to parse the chunk to avoid memory errors.

mitghi
  • 889
  • 7
  • 20