I have an HTML file with very bad formatted code that I get from a website, I want to extract some very small pieces of information.
I am only interested in lines that start like this:
</form></td><td><a href="http://www.mysite.com/users/user897" class="username"> <b>user897</b></a></td></tr><tr><td>HouseA</td><td>2</td><td class="entriesTableRow-gamename">HouseA Type12 <span class="entriesTableRow-moredetails"></span></td><td>1 of 2</td><td>user123</td><td>10</td><td>
and I want to extract 3 fields:
A:HouseA
B:HouseA Type12
C:user123
D:10
I know I've seen people recommend HTML Agility Pack and lib2xml but I really don't think I need all that. My app is in C/C++.
I am already using getline to start reading lines, I am just not sure what's the best way to proceed. Thanks!
std::ifstream data("Home.html");
std::string line;
while(std::getline(data,line))
{
linenum++;
std::stringstream lineStream(line);
std::string user;
if (strncmp(line.c_str(), "</form></td><td>",strlen("</form></td><td>")) == 0)
{
printf("found a wanted line in line:%d\n", linenum);
}
}