0

I need to find a regular expression to use for finding the content within and tags for use in PHP. I have tried...

preg_split("<td>([^\"]*)</td>", $table[0]);

But that gives me the PHP error...

Warning: preg_split(): Unknown modifier '(' in C:\xampp\htdocs\.....

Can anyone tell me what I am doing wrong?

David Carpenter
  • 1,389
  • 2
  • 16
  • 29
  • 1
    btw don't parse html with regex... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Sergio Feb 14 '14 at 13:22
  • You should start and end the pattern with a delimiter, usually `/`; moreover, you have to escape `/` in ``: `preg_split("/([^\"]*)<\/td>/", $table[0]);` – Spiros Feb 14 '14 at 13:23

4 Answers4

1

Try this:

preg_match("/<td>([^\"]*)<\/td>/", $table[0], $matches);

But, as a general rule, please, do not try to parse HTML with regexes... :-)

Community
  • 1
  • 1
MarcoS
  • 17,323
  • 24
  • 96
  • 174
1

Keep in mind that you need to do some extra work to make sure that the * between <td> and </td> in your regular expression doesn't slurp up entire lines of <td>some text</td>. That's because * is pretty greedy.

To toggle off the greediness of *, you can put a ? after it - this tells it just grab up until the first time it reaches whatever is after the *. So, the regular expression you're looking for is something like:

/<td>(.*?)<\/td>/

Remember, since the regular expression starts and ends with a /, you have to be careful about any / that is inside your regular expression - they have to be escaped. Hence, the \/.

From your regular expression, it looks like you're also trying to exclude any " character that might be between a <td> and </td> - is that correct? If that were the case, you would change the regular expression to use the following:

/<td>([^\"]*?)<\/td>/

But, assuming you don't want to exclude the " character in your matches, your PHP code could look like this, using preg_match_all instead of preg_match.

preg_match_all("/<td>(.*?)<\/td>/", $str, $matches);
print_r($matches);

What you're looking for is in $matches[1].

Alvin S. Lee
  • 4,984
  • 30
  • 34
0

Use preg_match instead of preg_split

preg_match("|<td>([^<]*)</td>|", $table[0], $m);
print_r($m);
Sabuj Hassan
  • 38,281
  • 14
  • 75
  • 85
0

First of all you forgot to wrap regex with delimiters. Also you shouldn't specify closing td tag in regex.

Try the following code. Assuming $table[0] contains html between <table>, </table> tags, it allows to extract any content (including html) from cells of table:

$a_result = array_map(
    function($v) { return preg_replace('/<\/td\s*>/i', '', $v); },
    array_slice(preg_split('/<td[^>]*>/i', $table[0]), 1)
);
hindmost
  • 7,125
  • 3
  • 27
  • 39