I've been struggling for many days now with a class PublicationSaver() that I wrote that has a method for loading xml documents as strings (not shown here) and then it passes each loaded string to self.savePublication(self, publication, myDirPath).
Every time I have used it crashed after about 25.000 strings and it saves the last string on which it crashes, I was able parse that string separately so I suppose that the problem is not bad XML.
I asked here but no answers.
I goggled a lot and it seems that I'm not the only one having this problem: here
So, since I really need to complete this task, I thought this: can I wrap all with a Thread set in main, so that when lxml parse throws an exception I get it and send a result to main to kill the thread and start it again?
#threading
result_q = Queue.Queue()
# Create the thread
xmlSplitter = XmlSplitter_Thread(result_q=result_q)
xmlSplitter.run(toSplit_DirPath, target_DirPath)
print "Hello !!!\n"
toSplitDirEmptyB=False
while not toSplitDirEmptyB:
splitterAlive=True
while splitterAlive:
sleep(120)
splitterAlive=result_q.get()
xmlSplitter.join()
print "*** KILLED XmlSplitter_Thread !!! ***\n"
if not os.listdir(toSplit_DirPath):
toSplitDirEmptyB=True
else:
xmlSplitter.run(toSplit_DirPath, target_DirPath)
Is this a valid approach ? When I run the code above at the moment is not working; I mean I never get the "Hello !!" displayed and the xmlSplitter just keep going even when it starts to fail (there's an exception rule that keeps it going).