In my application I need to parse simple HTML code without using as less as possible external libs. My HTML looks like
<p> First Content is P </p><h2>Header</h2><p> Text under header </p>
<h2>Header 2</h2><p> Paragraph </p>
<h3>yep</h3><p> no </p>
My html contains only the tags p, h2, h3
. I got the following structure:
struct Elements {
std::string tag;
std::string content;
};
std::vector<Elements> elems;
So my goal is after parsing each Elements in the vector should contain data like this:
tag = "h2"
content = "Header"
and
tag = "p"
content = "First Content is P"
PP: I need to get the elements in the order they're presented in the HTML.
Edit:
I just did this in javascript and it's working fine, but I have basically no idea how to write it down in c++:
var a = "<p> First Content is P </p><h2>Header</h2><p> Text under header </p>" +
"<h2>Header 2</h2><p> Paragraph </p>" +
"<h3>yep</h3><p> no </p>";
var output = [];
a.replace(/<\b[^>]*>(.*?)<\/(.*?)>/gmi, function(m, key, value) {
output.push({
tag: value,
data: key
});
})
/*
output:
{ tag: "p", data: "First Content is P"},
{ tag: "h2", data: "Header" }
.....
*/