PHP Help REGEX / Explanation

Question

Possible Duplicate:
Grabbing the href attribute of an A element

<td class="lbs" colspan="4">
    <span>4 Pack.</span>
    <span>
        <span class="strike">20.5</span>
        <span class="lbs">9.5</span>
    </span>
</td>

Hey guys! I need some help. I've tried my best searching for an answer but i seem to have hit a wall.

I'm using curl to grab the piece of code from multiple pages above.

I'm trying to grab each of the values e.g. "4 pack., 20.5, 9.5" and assign them to a variable so i can pass them to a db.

Use DomDocument http://php.net/manual/en/class.domdocument.php and if you want to make lookups easy, DOMXPath. — Ian, Jan 16 '13 at 14:10
Regular expressions aren't a magic wand that you wave at every problem that happens to involve strings. — Andy Lester, Jan 16 '13 at 15:12
@Gordon: Not really. That question deals specifically with grabbing attributes from HTML tags. This one deals with getting the contents of a given tag. Though I agree with the other commenters that this is not a job for regex. — WWW, Jan 16 '13 at 15:55
@Gordon: I'm not disputing that, I guess you and I just fall on different sides of this post: http://meta.stackexchange.com/questions/123976/change-wording-of-exact-duplicate — WWW, Jan 16 '13 at 16:02

score 0 · Answer 1 · answered Jan 16 '13 at 14:30

0

this works:

preg_match_all('@<td class="lbs" colspan="4">\s*<span>([^<]+)</span>\s*<span>\s*<span class="strike">([0-9\.]+)</span>\s*<span class="lbs">([0-9\.]+)</span>\s*</span>\s*</td>@ims', $source, $matches);

answered Jan 16 '13 at 14:30

jakoubekcz

1

That works for today, with that exact piece of HTML. It will break when `` becomes `` or `` or maybe the `colspan` disappears. You can't rely on the textual layout of HTML. That's why you use DOMDocument or another HTML parser. – Andy Lester Jan 16 '13 at 15:11

Oliver A. · Answer 2 · 2013-01-16T15:07:20.130

0

Do you really HAVE to use regex?

You have structured data, so just read the DOM.

If you MUST use regex try this:

@<[^>]+>[\n\r ]*([^<>\n]+)</[^>]+>@

EDIT

Usage:

$results = array();
preg_match_all(
  '@<[^>]+>[\n\r ]*([^<>\n]+)</[^>]+>@',
   $data, $results, PREG_PATTERN_ORDER
);

edited Jan 16 '13 at 15:07

answered Jan 16 '13 at 14:54

Oliver A.

2,870
2
19
21

PHP Help REGEX / Explanation

2 Answers2