0

Im trying to understand how the preg_match function in PHP works, but I just can't get the result I want.

Basically I have a string "html", which contains HTML code and I need to extract part of it, so the string looks like this:

... <div class="countrys CZ level1" id="CZ" alt="Česk&aacute; republika" ><span class="warn awt l3 t2"></span><div class="tendenz awt nt l3"></div></div></a></td><td class ...

In this string I want to extract everything contained in the div with id="CZ", not including what then continues in div class="tendenz....

So far I tried this:

preg_match("/alt=\"Česk&aacute; republika\" >(.*)/", $html, $results);
echo $results[1];

This way I got the beginning right, so everything from the start, but until the end of string, not only until the start of the next div.So then I tried:

preg_match("/alt=\"Česk&aacute; republika\" >(.*)<div/", $html, $results);
echo $results[1];

But for some reason I am still getting the beginning correctly, but then it goes on until the end of the string, not finishing with the next "div".

Any ideas what is wrong with my code? I really appreciate your help.

Jachym
  • 485
  • 9
  • 21
  • Generally speaking, you should use a proper parser and [**not regex**](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) to work with HTML, and PHP has several parsers you can use. – adeneo Feb 02 '14 at 20:19
  • I know, but I thought since the first part work, why does the second part (cutting the end) doesn't? It would be much easier for me at this point to do it like this and then later I might try other parsers. – Jachym Feb 02 '14 at 20:21
  • You know you're matching `
    – adeneo Feb 02 '14 at 20:25
  • I prefer `preg_match` to `DOMDocument`, however, very complicated regex will be required in this case... I recommend you using `DOMDocument`. – mpyw Feb 02 '14 at 20:35
  • I just dont understand why it is so easy to find the beginning and then just not stop when " is encountered - shouldn't it be just very similar to starting at the point I specified? – Jachym Feb 02 '14 at 20:39
  • ok, I solved it... required "(.*?)" instead just "(.*)" – Jachym Feb 02 '14 at 20:43
  • DOM Sample: http://ideone.com/ijwXxE – mpyw Feb 02 '14 at 20:52

0 Answers0