how it is possible ,to simply parse html links. For example I receive http response containing http. In which you have links to other files, which need to be downloaded for example jpgs, css files,js files. What is the simplest way to parse all this references.
Asked
Active
Viewed 892 times
0
-
If you need it in c++, then tag it c++ the next time... oh and you should **totally** try regex: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Ivo Wetzel Jan 06 '11 at 14:49
-
1@ivo, you suggest regex and point to the bane of parsing html with regex .. *hmmm..*, are you missing a **not** in there ? – Gabriele Petrioli Jan 06 '11 at 14:51
-
@Gaby Doesn't the link itself stand for sarcasm? :) – Ivo Wetzel Jan 06 '11 at 14:53
-
@Ivo - and those who _don't_ follow the link? Do you think _they_ will get the sarcasm? – Oded Jan 06 '11 at 14:54
-
@Ivo, wasn't sure but truth is that new members might not get it ... (*you did not have a single smiley .. :p*) – Gabriele Petrioli Jan 06 '11 at 15:50
1 Answers
1
Use an HTML parser for your platform/language.
There are some recommendations for c++ ones here.
Once you have a parsed document, you will need to look at each src
and href
in it - you will also need to remember the base
tag, if one exists and add logic for external, relative and absolute paths.