2

I need to get string from comment in HTML file, I was trying to do it with DOM, but I didn't find good solution with this method.

So I want to try it with regular expressions, but I can't find satisfactory solution. Please, can you help me?

This is what I need:

<!--adress-"String here I need to get"-->

Thanks in advance for answer

genesis
  • 50,477
  • 20
  • 96
  • 125
Lukáš Jelič
  • 529
  • 3
  • 8
  • 22
  • HTML isn't a regular language. It cannot be correctly parsed with regular expressions. – Mark Byers Oct 02 '11 at 18:38
  • 1
    Take a look here :) http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – IanNorton Oct 02 '11 at 18:39
  • This may help http://simplehtmldom.sourceforge.net/manual.htm – Caffeinated Oct 02 '11 at 18:39
  • @Mark: you shouldn't parse HTML with a Regex. However, the argumentation that HTML is not a regular language is usually bogus (since regular expressions are rarely regular, in any existing implementation) – sehe Oct 02 '11 at 18:40

3 Answers3

4

Look into $matches after this code

preg_match('~<!--adress-"(.*?)"-->~msi', $string, $matches);
genesis
  • 50,477
  • 20
  • 96
  • 125
  • Does anybody care to explain his/her downvote so I can improve my answer? – genesis Oct 02 '11 at 18:42
  • I don't know who downvoted you, but it's probably a knee-jerk reaction to your writing a regular expression to "parse HTML". I think it's just wrong of him or her, though, because I don't see why a regular expression can not be used to extract HTML comments. – Daniel Trebbien Oct 02 '11 at 18:47
  • One thing: Not all [HTML comments](http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.4) are matched by your regexp. Comments are delimited by `--` within the markup declaration open delimiter (`<!`) and the markup declaration close delimiter (`>`). – Daniel Trebbien Oct 02 '11 at 18:50
  • 1
    @DanielTrebbien: I just followed OP's needs – genesis Oct 02 '11 at 18:55
  • @DanielTrebbien according the the specs, `` ends it, so I fail to see what's wrong with that answer? – Madara's Ghost Oct 02 '11 at 19:01
  • @RikudoSennin: White space is not permitted between the markup declaration open delimiter and the comment open delimiter, but it is permitted between the comment close delimiter and the markup declaration close delimiter. Valid comment: ` – Daniel Trebbien Oct 02 '11 at 19:05
  • Yes, that is correct. However, not needed. REGEX cannot correctly parse HTML/XML, because it can never be made to take every possible aspect into account, but you can compromise on a stricter syntax in favor of shorter REGEX. The OP has stated his syntax, that expression is valid. – Madara's Ghost Oct 02 '11 at 19:08
1

HTML comments are regular; you can just match <!--adress-"([^">]+)"--> and get the first group.

This assumes that the comments are always well-formed and always have a quoted string containing no quotes.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
1

It will be more accurate:

$regex = '<!--(.+?)-"{0,1}(.+?)"{0,1}-->';
preg_match_all($regex, $html, $matches_array);

Just do the var_dump($matches_array) and see results.

Vladimir Fesko
  • 222
  • 2
  • 3