0

The following situation:

$text = "This is some <span class='classname'>example</span> text i'm writing to
demonstrate the <span class='classname otherclass'>problem</span> of this.<br />";

preg_match_all("|<[^>/]*(classname)(.+)>(.*)</[^>]+>|U", $text, $matches, PREG_PATTERN_ORDER);

I need an array ($matches) where in one field is "<span class='classname'>example</span>" and in another "example". But what i get here is one field with "<span class='classname'>example</span>" and one with "classname".

It also should contain the values for the other matches, of course.

how can i get the right values?

John Doe Smith
  • 1,623
  • 4
  • 24
  • 39
  • Best advice: forget regexes exist, and switch to using DOM. It'll take you far less time to come up with a nice simple XPath query and a few dom node-extraction calls than it will to get the equivalent regex working - plus you won't beat your brain into a pulp doing so. – Marc B Aug 27 '12 at 15:25
  • 1
    Die Cthulu, die!! Go back from whence you came... how long... noooo darkness reigns supreme [here](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) **JUST PARSE THE HTML** – Elias Van Ootegem Aug 27 '12 at 15:25
  • 1
    [The pony, he comes...](http://stackoverflow.com/a/1732454/1338999) – Matt Aug 27 '12 at 15:26
  • 1
    I have to agree, there are better ways for parsing HTML (as linked above). However, have you tried dumping your $matches variable? A copy paste of your code and a var_dump, provided me with $matches[3] as an array containing the values you were looking for. – Chris Aug 27 '12 at 15:29
  • Thank you Chris: That's the right answer! √ – John Doe Smith Aug 27 '12 at 15:34
  • Just a slight remark: Why would anyone use pipes as regex delimiters?? that's like amputating a limb, IMHO – Elias Van Ootegem Aug 27 '12 at 15:48

2 Answers2

0

The safe/easy way:

$text = 'blah blah blah';

$dom = new DOM();
$dom->loadHTML($text);

$xp = new DOMXPath($dom);

$nodes = $xp->query("//span[@class='classname']");
foreach($nodes as $node) {
    $innertext = $node->nodeValue;
    $html =  // see http://stackoverflow.com/questions/2087103/innerhtml-in-phps-domdocument
}
Marc B
  • 356,200
  • 43
  • 426
  • 500
0

You would be better off with a DOM parser, however this question is more to do with how capturing works in Regexes in general.

The reason you are getting classname as a match is because you are capturing it by putting () around it. They are completely unnecessary so you can just remove them. Similarly, you don't need them around .+ since you don't want to capture that.

If you had some group that you had to enclose in () as grouping rather than capturing, start the group with ?: and it won't be captured.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592