As a learning exercise, I'm writing a web scraper in Common Lisp. The (rough) plan is:
I've just run into a sticking point: the website I'm scraping doesn't always produce valid XHTML. This means that step 3 (parse the pages with xmls) doesn't work. And I'm as loath to use regular expressions as this guy :-)
So, can anyone recommend a Common Lisp package for parsing invalid XHTML? I'm imagining something similar to the HTML Agility Pack for .NET ...