0

I need to extract all links from html page using regular expressions in C++. Can anybody help me please ?

  • 1
    possible duplicate of [Regex to get the link in href. \[asp.net\]](http://stackoverflow.com/questions/1496619/regex-to-get-the-link-in-href-asp-net) – Jerry Coffin Sep 30 '10 at 19:47
  • Have you looked at [boost's regex](http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/index.html) or [tr1](http://en.wikipedia.org/wiki/C%2B%2B_Technical_Report_1) regex? – dgnorton Sep 30 '10 at 19:48
  • 1
    Why do you have to use regular expressions for that task? There are more appropriate things, like parsers. – Roland Illig Sep 30 '10 at 20:04
  • http://stackoverflow.com/questions/1732348?page=2&tab=votes#tab-top – SingleNegationElimination Sep 30 '10 at 20:33
  • Agree with Roland Illig. _Regular_ expressions are used for _regular_ languages. That's a precisely defined term, and it's a fact that HTML is _not_ regular. – MSalters Oct 01 '10 at 08:05
  • @TokenMacGuy: I don't think C++ regexes have that "recursive" extension. Better link the top answer, http://stackoverflow.com/questions/1732348?page=1&tab=votes#tab-top – MSalters Oct 01 '10 at 08:14

1 Answers1

0

This is a hard job for a regex, and in C++ it's even harder. I actually wrote a parser for a project I did for school a few years ago. You can use this if you find that it works, but I would test it on what you want before you rely on it for anything important.

Feel free to modify/use it, whatever

I realized there were some mistakes in my code, and that I should probably include the header file. Also included is the cmakelists file but it's trivial. The ParserTest.cpp file basically lets you parse links from an input string from the command line.

http://www.mediafire.com/?0u5ppq0gzgdyg

Falmarri
  • 47,727
  • 41
  • 151
  • 191
  • 1
    Agreed. But it depends what you mean by parse links. Do you want to parse ALL the links? Because parsing links with a regex is fine, it's just not guaranteed to work =] – Falmarri Sep 30 '10 at 20:05