0

I'm making a plugin that will create a table based on some pre-existing data.

Sometimes, some of that data has line breaks (\n or \r) in the middle, and that is out of my control. After I finish parsing the data, the software will replace all line breaks with <br>, so I need to remove all of them that are not inside <th> or <td>

This regex will match all of them (Fiddle):

(>[^<]*)\n([^<]*<)

How can I make it match all line breaks, except the ones inside <td></td> and <th></th>

Thank you

Cornwell
  • 3,304
  • 7
  • 51
  • 84

2 Answers2

1

Use the below regex and then replace the matched \n chars with an empty string.

<(th|td)>.*?<\/\1>(*SKIP)(*F)|\n

<(th|td)>.*?<\/\1> matches all the td or th tags. Now the following (*SKIP)(*F) makes the match to fail and then it tries to match the characters according to the pattern which exists next to the alternation operator from the remaining string. So it matches all the new line chars which are present outside the td and th tags.

DEMO

Example:

$string = <<<EOT
<table>
<tr><th>HEader 1</th><th> header 
2</th>
</tr>
<tr><td>cell 
content</td><td>cell 2</td></tr>
</table>
EOT;
echo preg_replace('~<(th|td)>.*?<\/\1>(*SKIP)(*F)|\n~s', '', $string);

Output:

<table><tr><th>HEader 1</th><th> header 
2</th></tr><tr><td>cell 
content</td><td>cell 2</td></tr></table>

Reference:

Community
  • 1
  • 1
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
1

There is a simpler way (in case there are no tags in between the th and td tags):

\n(?!.*?<\/(?:th|td)>)

In case there are tags in between, you can use Avinash's approach, or this one that also uses (*SKIP)(*FAIL) trick, but allows any number of attributes:

(?s)<(t[hd])[^>]*?>.*?<\/\1>(*SKIP)(*FAIL)|\n

See demo.

With input as

<table>
<tr><th>HEader 1</th><th> header 
2</th>
</tr>
<tr><td width="100"><b>cell 
content</b></td><td>cell 2</td></tr>
</table>

Output is

<table><tr><th>HEader 1</th><th> header 
2</th></tr><tr><td width="100"><b>cell 
content</b></td><td>cell 2</td></tr></table>
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563