0

I've spent a good few hours trying to get this regular expression to work and I'll I've got so far is one big headache!

I'm using cURL to load a page into variable $o. Now somewhere in this page is the following:

        <tr valign="top">
   <td>value1</td>
   <td>value2</td>
   <td align="right">value3</td>
  </tr>

And this is repeated 3 or so times, naturally, I'd like to grab value1, value2, value3 and store them in an array. Here's my attempt:

  preg_match_all('/<tr valign="top"><td>(.*)<\/td><td>(.*)<\/td><td align="right">(.*)<\/td><\/tr>/',
                        $o,
                        $out);

But all this seems to output is an empty array. Can anyone spot where I've gone wrong?

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Michael
  • 1
  • 1

3 Answers3

5

Don't use regular expressions to parse HTML. Use an HTML parser.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
1

Just make your life easier:

$dom = new SimpleXmlElement($curlResponse);
$candidates = $dom->xpath("tr[@valign='top']");

foreach($candidates as $tr)
{
   if(count($tr->td) == 3 && (isset($tr->td[2]['align']) &&  $tr->td[2]['align']== 'right'))
   {
      foreach($tr->td as $td)
      {
          // do something with value $td
      }
   }
}

You culd probably even simplyfiy that by moving some of the tests directly to the xpath expression to find a unique td signature within the structure and then go back up to the parent tr and iterate over the td's... but im far from an xpath guru so i keep it simple :-)

prodigitalson
  • 60,050
  • 10
  • 100
  • 114
0

Looks like you're missing some newlines. Try

  preg_match_all('/<tr valign="top">.*<td>(.*)<\/td>.*<td>(.*)<\/td>.*<td align="right">(.*)<\/td>.*<\/tr>/s',
                    $o,
                    $out);

The /s makes the dot match all characters (normally it doesn't match newlines). If you run into problems, it might be because there are other tds or trs in the output. You can fix that by making the stars lazy by appending a ? after each.

mwhite
  • 2,041
  • 1
  • 16
  • 21