Save memory parsing very large XML files
You could use this code which is a bit newer then the effbot.org one, it might save you more memory:
Using Python Iterparse For Large XML Files
Multiprocessing / Multithreading
If I remember correctly you can not do multiprocessing easily to speed up the proces when loading/parsing the XML. If this was an easy option everyone would probably already do it by default.
Python in general uses a global interpreter lock (GIL) and this causes Python to run within one proces and this is bound to one core of your CPU. When threads are used they run in context of the main Python proces which is still bound to only one core. Using threads in Python can lead to a performance decrease due to the context switching. Running multiple Python processes on multiple cores brings the expected additional performance, but those do not share memory so you need inter proces communication (IPC) to have processes work together (you can use multiprocessing in a pool, they sync when the work is done but mostly useful for (not to) small tasks that are finite). Sharing memory is required I would assume as every task is working on the same big XML.
LXML however has some way to work around the GIL but it only improves performance under certain conditions.
Threading in LXML
For introducing threading in lxml there is a part in the FAQ that talks about this: http://lxml.de/FAQ.html#id1
Can I use threads to concurrently access the lxml API?
Short answer: yes, if you use lxml 2.2 and later.
Since version 1.1, lxml frees the GIL (Python's global interpreter lock) internally when parsing from disk and memory, as long as you use either the default parser (which is replicated for each thread) or create a parser for each thread yourself. lxml also allows concurrency during validation (RelaxNG and XMLSchema) and XSL transformation. You can share RelaxNG, XMLSchema and XSLT objects between threads
Does my program run faster if I use threads?
Depends. The best way to answer this is timing and profiling.
The global interpreter lock (GIL) in Python serializes access to the interpreter, so if the majority of your processing is done in Python code (walking trees, modifying elements, etc.), your gain will be close to zero. The more of your XML processing moves into lxml, however, the higher your gain. If your application is bound by XML parsing and serialisation, or by very selective XPath expressions and complex XSLTs, your speedup on multi-processor machines can be substantial.
See the question above to learn which operations free the GIL to support multi-threading.
Additional tips on optimizing performance for parsing large XML
https://www.ibm.com/developerworks/library/x-hiperfparse/