I have a dilemma.
I need to read very large XML files from all kinds of sources, so the files are often invalid XML or malformed XML. I still must be able to read the files and extract some info from them. I do need to get tag information, so I need XML parser.
Is it possible to use Beautiful Soup to read the data as a stream instead of the whole file into memory?
I tried to use ElementTree, but I cannot because it chokes on any malformed XML.
If Python is not the best language to use for this project please add your recommendations.