So I want to create a web crawler in C. There are hardly any libraries to support this.
I can use libtidy to convert HTML to XHTML and get the HTML files using libcurl (which has decent documentation).
My problem is parsing the HTML files and getting all the links present in it. I know libxml2 is there but its extremely hard to understand because there is no good documentation for its API.
Should I even do this in C or go with another language like Java ? Or are there any good alternatives to libxml2 ?