0

Hello I have a problem with my Regex code I use to get a value out of a HTML-tag using PHP. I have the following strings possible:

<span class="down last_position">xyz</span>
<span class="up last_position">xyz</span>
<span class="last_position new">xyz</span>

And I have the following preg_match command:

preg_match('#<span class="last_position.*?">(.+)</span>#', $string, $matches);

Which pretty much just covers case #3. So I was wondering what I would need to add in front of last_position to get all cases possible..?

Thanks a lot..

Edit: For all who are wondering what value is to be matched: "xyz"

Sebastian
  • 363
  • 3
  • 14
  • 5
    Don't use regex to parse HTML. While there are cases where regex can be used, the task that you are doing is best done with a HTML parser. – nhahtdh Apr 26 '13 at 07:52
  • Yes I know, I am using DomDocument for the whole parsing.. I was just wondering if somebody would know... – Sebastian Apr 26 '13 at 07:53
  • http://stackoverflow.com/questions/6366351/getting-dom-elements-by-class-name – nhahtdh Apr 26 '13 at 07:56

5 Answers5

5

Avoid using regex to parse HTML as it can be error prone. Your specific UseCase is better solved with a DOM parser:

$html = <<< EOF
<span class="down last_position">xyz</span>
<span class="up last_position">xyz</span>
<span class="last_position new">xyz</span>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$nodeList = $xpath->query("//span[contains(@class, 'last_position')]/text()");
for($i=0; $i < $nodeList->length; $i++) {
    $node = $nodeList->item($i);
    var_dump($node->nodeValue);
}

OUTPUT:

string(3) "xyz"
string(3) "xyz"
string(3) "xyz"
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

Try to use this

preg_match('#<span class="?(.*)last_position.*?">(.+)</span>#', $string, $matches);
Kailash Yadav
  • 1,880
  • 2
  • 18
  • 37
1

You could try this:

preg_match_all('#<span class="[^"]*last_position[^"]*">(.+)</span>#', $string, $matches, PREG_PATTERN_ORDER);

You'll then find the values in $matches[1][0], $matches[1][1], $matches[1][2] ....

The part I added in the class attributes value [^"]* matches any number of characters that does not match a doublequote. Thus it matches anything inside the attributes value.

Gordon
  • 312,688
  • 75
  • 539
  • 559
Nikolas
  • 1,166
  • 1
  • 6
  • 11
1

Try the following (and yes you can use regex to match data from HTML):

$string = '<span class="down last_position">xyz</span>
<span class="up last_position">xyz</span>
<span class="last_position new">xyz</span>';

preg_match_all('#<span\s.*?class=".*?last_position.*?".*?>(.+?)</span>#i', $string, $m);
print_r($m);

Online demo.

HamZa
  • 14,671
  • 11
  • 54
  • 75
  • 2
    I accepted this answer as the answer, due to my question was not how to do it with DomDocument but with Regex. And it was very detailed, thank you :) – Sebastian Apr 26 '13 at 08:06
  • 1
    please explain the pattern. Why does this work? – Gordon Apr 26 '13 at 08:30
  • @HamZaDzCyberDeV: `almost all situations` How do you quantify "almost all"? http://regex101.com/r/tZ3pA2 I can't do anything about my votes, since it is locked, but I don't think this answer deserve an upvote. – nhahtdh Apr 26 '13 at 08:32
  • @nhahtdh Ok you got me there, I'm not gonna argue since you're against the idea of using regex for HTML in the first place. – HamZa Apr 26 '13 at 08:50
0

Sure, parsing XML is not possible using RegEx, because XML is not regular. But in many real-world cases, XML documents used as input are limited and predictable enough to simply be treated as text.

Something like this should work for you:

preg_match('#<span class="[^>"]*?last_position[^>"]*">(.+)</span>#', $string, $matches);
richardtallent
  • 34,724
  • 14
  • 83
  • 123