1

I've got a Regex query here to pull out all of the tags in a page. It looks like this:

preg_match_all('%<tr[^>]++>(.*?)</tr>%s', $pageText, $rows);

Problem is that while it does find all of the tags on the page in the return array it actually returns a multidimensional array, where each entry of the first array contains an array of all of the matches. In other words, it hands me multiple identical copies of the first array, IE the one I actually want.

Help please?

EDIT: Also relevant: I'm not allowed to use DOM for this application despite it being a significantly easier (and better) way of going about things.

moberemk
  • 1,597
  • 1
  • 18
  • 39

2 Answers2

0

Try this one:

preg_match_all('~<tr(?:\\s+[^>]*)?>(.*?)</tr>~si', $pageText, $rows);
var_dump($rows[1]);

Don't use % to wrap RegExps. It's a character somehow reserved for printf() like functions and with %s or %i at the end of your Pattern, it can be quite confusing.

CodeAngry
  • 12,760
  • 3
  • 50
  • 57
0

What you're actually asking about is the $row[0] list, which redundantly contains the <tr>...</tr> blob again. If you just care about the (.*?) inner data, then use \K to reset the full match.

preg_match_all('=<tr\b[^>]*+>(.*?)</tr>\K=s', $pageText, $rows);

It's not possible to get rid of $row[0] completely. You'll have to ignore it, and use $row[1] alone.

mario
  • 144,265
  • 20
  • 237
  • 291