How to get the value using Regex?

Question

Hello I have a problem with my Regex code I use to get a value out of a HTML-tag using PHP. I have the following strings possible:

<span class="down last_position">xyz</span>
<span class="up last_position">xyz</span>
<span class="last_position new">xyz</span>

And I have the following preg_match command:

preg_match('#<span class="last_position.*?">(.+)</span>#', $string, $matches);

Which pretty much just covers case #3. So I was wondering what I would need to add in front of last_position to get all cases possible..?

Thanks a lot..

Edit: For all who are wondering what value is to be matched: "xyz"

Don't use regex to parse HTML. While there are cases where regex can be used, the task that you are doing is best done with a HTML parser. — nhahtdh, Apr 26 '13 at 07:52
Yes I know, I am using DomDocument for the whole parsing.. I was just wondering if somebody would know... — Sebastian, Apr 26 '13 at 07:53
http://stackoverflow.com/questions/6366351/getting-dom-elements-by-class-name — nhahtdh, Apr 26 '13 at 07:56

anubhava · Answer 1 · 2013-04-26T08:13:16.420

5

Avoid using regex to parse HTML as it can be error prone. Your specific UseCase is better solved with a DOM parser:

$html = <<< EOF
<span class="down last_position">xyz</span>
<span class="up last_position">xyz</span>
<span class="last_position new">xyz</span>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$nodeList = $xpath->query("//span[contains(@class, 'last_position')]/text()");
for($i=0; $i < $nodeList->length; $i++) {
    $node = $nodeList->item($i);
    var_dump($node->nodeValue);
}

OUTPUT:

string(3) "xyz"
string(3) "xyz"
string(3) "xyz"

edited Apr 26 '13 at 08:13

answered Apr 26 '13 at 07:54

anubhava

761,203
64
569
643

And about performance what is the best to use Dom or preg_match_all ? – ElSinus Apr 26 '13 at 08:02
1

@ElSinus: People complain about regex performances as well. So i'm note sure which one will be faster. – anubhava Apr 26 '13 at 08:09

score 1 · Answer 2 · answered Apr 26 '13 at 07:54

1

Try to use this

preg_match('#<span class="?(.*)last_position.*?">(.+)</span>#', $string, $matches);

answered Apr 26 '13 at 07:54

Kailash Yadav

1,880
2
18
37

1

Note that if a bunch of span tags are on the same line, this is not going to work. – nhahtdh Apr 26 '13 at 07:57
@nhahtdh it's not mentioned in the question. – Kailash Yadav Apr 26 '13 at 07:59
It depends on how much assumption you want to make. Currently, it is going to break very easily over a change in new line. – nhahtdh Apr 26 '13 at 08:03
please explain the pattern. Why does this work? – Gordon Apr 26 '13 at 08:29

score 1 · Answer 3 · edited Apr 26 '13 at 09:01

1

You could try this:

preg_match_all('#<span class="[^"]*last_position[^"]*">(.+)</span>#', $string, $matches, PREG_PATTERN_ORDER);

You'll then find the values in $matches[1][0], $matches[1][1], $matches[1][2] ....

The part I added in the class attributes value [^"]* matches any number of characters that does not match a doublequote. Thus it matches anything inside the attributes value.

edited Apr 26 '13 at 09:01

Gordon

312,688
75
539
559

answered Apr 26 '13 at 07:55

Nikolas

1,166
1
6
11

score 1 · Accepted Answer · answered Apr 26 '13 at 07:55

1

Try the following (and yes you can use regex to match data from HTML):

$string = '<span class="down last_position">xyz</span>
<span class="up last_position">xyz</span>
<span class="last_position new">xyz</span>';

preg_match_all('#<span\s.*?class=".*?last_position.*?".*?>(.+?)</span>#i', $string, $m);
print_r($m);

Online demo.

answered Apr 26 '13 at 07:55

HamZa

14,671
11
54
75

2

I accepted this answer as the answer, due to my question was not how to do it with DomDocument but with Regex. And it was very detailed, thank you :) – Sebastian Apr 26 '13 at 08:06
1

please explain the pattern. Why does this work? – Gordon Apr 26 '13 at 08:30
@HamZaDzCyberDeV: `almost all situations` How do you quantify "almost all"? http://regex101.com/r/tZ3pA2 I can't do anything about my votes, since it is locked, but I don't think this answer deserve an upvote. – nhahtdh Apr 26 '13 at 08:32
@nhahtdh Ok you got me there, I'm not gonna argue since you're against the idea of using regex for HTML in the first place. – HamZa Apr 26 '13 at 08:50

score 0 · Answer 5 · answered Apr 26 '13 at 08:00

0

Sure, parsing XML is not possible using RegEx, because XML is not regular. But in many real-world cases, XML documents used as input are limited and predictable enough to simply be treated as text.

Something like this should work for you:

preg_match('#<span class="[^>"]*?last_position[^>"]*">(.+)</span>#', $string, $matches);

answered Apr 26 '13 at 08:00

richardtallent

34,724
14
83
123

please explain the pattern. Why does this work? – Gordon Apr 26 '13 at 08:29

How to get the value using Regex?

5 Answers5