I need to parse about 100 kB of HTML data and this simply causes huge performance issues on Android. I've tried both the built-in XML parser and JTidy.
The built-in XML parser gives me a parsing time of about half a second, which I can easily live with. Problem is that it's a bad idea to use an XML parser to parse messy HTML code, those this is not an option. (I tried preprocessing, but it even started complaining about valid HTML, so...)
I googled a bit and JTidy was suggested for cleaning up the code before passing it to an XML parser. This was an absolute nightmare, with JTidy for preprocessing parsing now takes approximately 7 seconds.
So now my only alternative really is regex. What do you think?