Regular expression for contents within and

Question

I need to find a regular expression to use for finding the content within and tags for use in PHP. I have tried...

preg_split("<td>([^\"]*)</td>", $table[0]);

But that gives me the PHP error...

Warning: preg_split(): Unknown modifier '(' in C:\xampp\htdocs\.....

Can anyone tell me what I am doing wrong?

btw don't parse html with regex... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — Sergio, Feb 14 '14 at 13:22
You should start and end the pattern with a delimiter, usually `/`; moreover, you have to escape `/` in ``: `preg_split("/([^\"]*)<\/td>/", $table[0]);` — Spiros, Feb 14 '14 at 13:23

score 1 · Accepted Answer · edited May 23 '17 at 12:29

1

Try this:

preg_match("/<td>([^\"]*)<\/td>/", $table[0], $matches);

But, as a general rule, please, do not try to parse HTML with regexes... :-)

edited May 23 '17 at 12:29

Community

1
1

answered Feb 14 '14 at 13:21

MarcoS

17,323
24
96
174

score 1 · Answer 2 · answered Feb 14 '14 at 13:38

Keep in mind that you need to do some extra work to make sure that the * between <td> and </td> in your regular expression doesn't slurp up entire lines of <td>some text</td>. That's because * is pretty greedy.

To toggle off the greediness of *, you can put a ? after it - this tells it just grab up until the first time it reaches whatever is after the *. So, the regular expression you're looking for is something like:

/<td>(.*?)<\/td>/

Remember, since the regular expression starts and ends with a /, you have to be careful about any / that is inside your regular expression - they have to be escaped. Hence, the \/.

From your regular expression, it looks like you're also trying to exclude any " character that might be between a <td> and </td> - is that correct? If that were the case, you would change the regular expression to use the following:

/<td>([^\"]*?)<\/td>/

But, assuming you don't want to exclude the " character in your matches, your PHP code could look like this, using preg_match_all instead of preg_match.

preg_match_all("/<td>(.*?)<\/td>/", $str, $matches);
print_r($matches);

What you're looking for is in $matches[1].

score 0 · Answer 3 · answered Feb 14 '14 at 13:22

0

Use preg_match instead of preg_split

preg_match("|<td>([^<]*)</td>|", $table[0], $m);
print_r($m);

answered Feb 14 '14 at 13:22

Sabuj Hassan

38,281
14
75
85

hindmost · Answer 4 · 2014-02-14T16:39:13.223

First of all you forgot to wrap regex with delimiters. Also you shouldn't specify closing td tag in regex.

Try the following code. Assuming $table[0] contains html between <table>, </table> tags, it allows to extract any content (including html) from cells of table:

$a_result = array_map(
    function($v) { return preg_replace('/<\/td\s*>/i', '', $v); },
    array_slice(preg_split('/<td[^>]*>/i', $table[0]), 1)
);

Regular expression for contents within and

4 Answers4

Linked