1

I am trying to implement a regular expression that basically will extract all elements within the ELEMENTS row.

Say that I have this html string:

<tr> <td> ELEMENTS</td> <td> <element>A1</element> , <element>A2</element> </td></tr><tr> <td> MORE_ELEMENTS</td> <td><element> A3</element>, <element> A4</element>, <element> A5</element> </td></tr>

And I want to extract all elements within the ELEMENT row (A1, A2 and A3) but not the elements within the MORE_ELEMENTS row (A4, A5 and A6).

Using this regexp you can match all elements:

<element>([^<]+)<\/element>\s*,*\s*

But if I try to restrict to ELEMENTS using this regexp:

<td>\s*ELEMENTS.*?<element>([^<]+)<\/element>\s*,*\s*

I match only the first element. I don't know how to match the ELEMENTS row and then iterate within it to extract all elements.

Tried this as well, but didn't work either:

<td>\s*ELEMENTS.*?<element>([^<]+)<\/element>\s*,*\s*(<element>([^<]+)<\/element>\s*,*\s*)*

Any ideas? Thanks very much in advance!

Migsy

antyrat
  • 27,479
  • 9
  • 75
  • 76
Migsy
  • 41
  • 4
  • 1
    You don't want to do this with regexp, look for a meachanize-like tool instead. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Anders Lindahl May 27 '11 at 09:08
  • Then check out the answers here: http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php – Anders Lindahl May 27 '11 at 09:23

2 Answers2

1
$test = '<tr> <td> ELEMENTS</td> <td> <element>A1</element> , <element>A2</element> </td></tr><tr> <td> MORE_ELEMENTS</td> <td><element> A3</element>, <element> A4</element>, <element> A5</element> </td></tr>';

    preg_match_all ('~<element>([^<]*)</element>~', $test, $match);

    foreach ($match [1] as $value)
    {
            // do what you inteded
    }

http://php.net/manual/en/function.preg-match-all.php

akond
  • 15,865
  • 4
  • 35
  • 55
0

It is generally considered a bad idea to parse html or xml with regular expressions. It doesn't work in tne general case, but works fine for specific cases if you understand the limitations.

However, just because you want to use regular expressions, there's no reason to insist on only using regular expressions. For this problem, use a regular expression to extract your block, then use another expression or function on the result. You don't win any bonus points for cramming as much into one expression as possible. To the contrary, it's better to write readable code than to write "clever" code.

This is especially true when you need someone to write the expression for you. If you don't understand regular expressions enough to solve your problem, aim to limit the complexity of your expressions as much as possible by seeking out other solutions or breaking one large, complex pattern into several smaller, more comprehensible patterns.

Bryan Oakley
  • 370,779
  • 53
  • 539
  • 685