Regular Expression - get tables from html string in PHP

Question

I try to wrap all tables inside my content with a special div container, to make them usable for mobile. I can't wrap the tables, before they are saved within the database of the custom CSS. I managed to get to the content, before it's printed on the page and I need to preg_replace all the tables there.

I do this, to get all tables:

preg_match_all('/(<table[^>]*>(?:.|\n)*<\/table>)/', $aFile['sContent'], $aMatches);

The problem is to get the inner part (?:.|\n)* to match everything that is inside the tags, without matching the ending tag. Right now the expression matches everything, even the ending tag of the table...

Is there a way to exclude the match for the ending tag?

"Is there a way to exclude the match for the ending tag?" - Use a HTML parser and not regex — exussum, Jul 31 '14 at 08:14
You should use lazy match model,just try: preg_match_all('/(]*>(?:.|\n)*?<\/table>)/', $aFile['sContent'], $aMatches); — Tim.Tang, Jul 31 '14 at 08:14
First of all - you should not use regex when it is not needed. Second, have a read here: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags and finally use hek2mgl answer — Talisin, Jul 31 '14 at 08:26
Possible duplicate of [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — Brian Tompsett - 汤莱恩, Jun 29 '17 at 18:42

hek2mgl · Accepted Answer · 2014-07-31T08:28:33.753

9

You need to perform a non greedy match: /(<table[^>]*>(?:.|\n)*?<\/table>)/. Note the question mark: ?.

However, I would use a DOM parser for that:

$doc = new DOMDocument();
$doc->loadHTML($html);

$tables = $doc->getElementsByTagName('table');
foreach($tables as $table) {
    $content = $doc->saveHTML($table); 
}

While it is already more convenient to use a DOM parser for extracting data from HTML documents, it is definitely the better solution if you are attempting to modify the HTML (as you told).

edited Jul 31 '14 at 08:28

answered Jul 31 '14 at 08:15

hek2mgl

152,036
28
249
266

1

+1 as avoiding regex for parsing HTML which is not a regular language and hence should not be parsed by regular expressions. – Talisin Jul 31 '14 at 08:28
Thank you! The non greedy match did the trick! My final regexp: /(?m)(]*>(?:.|\n|\r)*?<\/table>)/ I'm not that familiar with the DOM parser, but i'll try to implement this version. If i get it right, i'll use this instead. Thanks a lot :)
– Jozze Jul 31 '14 at 11:35
You are welcome. Just copy the code I've posted. The example aims to be working code. – hek2mgl Jul 31 '14 at 11:36
Doesn't work for me... at least for now. There seem to be some namespace errors. It can't find DOMDocument() ... maybe the php extension is not installed or something like that. But the regex works for now and i'll try to change it again, when our senior developer comes back. I'll try to remember to post the result here, when it's done. Thanks again! – Jozze Jul 31 '14 at 15:08
@Jozze If you are working in a namespace you need to use `\DOMDocument` .. Note the `\\` which is addressing the global PHP namespace. – hek2mgl Jul 31 '14 at 15:09

score 0 · Answer 2 · answered Jul 31 '14 at 08:14

0

You could use lookahead if you don't want to match the end tag,

preg_match_all('/(<table[^>]*>(?:.|\n)*(?=<\/table>))/', $aFile['sContent'], $aMatches);

answered Jul 31 '14 at 08:14

Avinash Raj

172,303
28
230
274

Regular Expression - get tables from html string in PHP

2 Answers2