htmlcxx is a simple non-validating css1 and html parser for C++
htmlcxx is a simple non-validating css1 and html parser for C++. Although there are several other html parsers available, htmlcxx has some characteristics that make it unique:
- STL like navigation of DOM tree, using excelent's tree.hh library from Kasper Peeters
- It is possible to reproduce exactly, character by character, the original document from the parse tree
- Bundled css parser
- Optional parsing of attributes
- C++ code that looks like C++ (not so true anymore)
- Offsets of tags/elements in the original document are stored in the nodes of the DOM tree