-1

Possible Duplicate:
Grabbing the href attribute of an A element

<td class="lbs" colspan="4">
    <span>4 Pack.</span>
    <span>
        <span class="strike">20.5</span>
        <span class="lbs">9.5</span>
    </span>
</td>

Hey guys! I need some help. I've tried my best searching for an answer but i seem to have hit a wall.

I'm using curl to grab the piece of code from multiple pages above.

I'm trying to grab each of the values e.g. "4 pack., 20.5, 9.5" and assign them to a variable so i can pass them to a db.

Community
  • 1
  • 1
  • 9
    This looks like a job for `DOMDocument` – John Dvorak Jan 16 '13 at 14:07
  • 3
    Use DomDocument http://php.net/manual/en/class.domdocument.php and if you want to make lookups easy, DOMXPath. – Ian Jan 16 '13 at 14:10
  • 1
    Also, what have you tried? – WWW Jan 16 '13 at 14:11
  • 1
    Yep. DomDocument. This is definitely *not* a job for regex. – SDC Jan 16 '13 at 14:12
  • 1
    Regular expressions aren't a magic wand that you wave at every problem that happens to involve strings. – Andy Lester Jan 16 '13 at 15:12
  • @Gordon: Not really. That question deals specifically with grabbing attributes from HTML tags. This one deals with getting the contents of a given tag. Though I agree with the other commenters that this is not a job for regex. – WWW Jan 16 '13 at 15:55
  • @Gordon: I'm not disputing that, I guess you and I just fall on different sides of this post: http://meta.stackexchange.com/questions/123976/change-wording-of-exact-duplicate – WWW Jan 16 '13 at 16:02

2 Answers2

0

this works:

preg_match_all('@<td class="lbs" colspan="4">\s*<span>([^<]+)</span>\s*<span>\s*<span class="strike">([0-9\.]+)</span>\s*<span class="lbs">([0-9\.]+)</span>\s*</span>\s*</td>@ims', $source, $matches);
  • That works for today, with that exact piece of HTML. It will break when `` becomes `` or `` or maybe the `colspan` disappears. You can't rely on the textual layout of HTML. That's why you use DOMDocument or another HTML parser. – Andy Lester Jan 16 '13 at 15:11
0

Do you really HAVE to use regex?

You have structured data, so just read the DOM.

If you MUST use regex try this:

@<[^>]+>[\n\r ]*([^<>\n]+)</[^>]+>@

EDIT

Usage:

$results = array();
preg_match_all(
  '@<[^>]+>[\n\r ]*([^<>\n]+)</[^>]+>@',
   $data, $results, PREG_PATTERN_ORDER
);
Oliver A.
  • 2,870
  • 2
  • 19
  • 21