0

I want to remove all the <br /> inside the table using PHP. I know I could use str_replace() to remove <br />. But it will remove all <br />. I only want to remove <br /> between <table> and </table>. I have several tables in one string.

The html code is below. Also you can see this fiddle.

<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>

<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>

I tried the following way to do this, is this the best solution?

<?php
    $input = '<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>

<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>';


$body = preg_replace_callback("~<table\b.*?/table>~si", "process_table", $input);

function process_table($match) {

        return str_replace('<br />', '', $match[0]);

}

echo $body;
Tester
  • 798
  • 2
  • 12
  • 32

2 Answers2

1

As stated here, "Regex is not a tool that can be used to correctly parse HTML". However, to give a solution that was asked for that works for this controlled case, I submit the following. It includes debug code which shows the before and after.

Note: I also tested with your regex and it works as well with /<table\b.*?<\/table>/si in the preg_match()

<?php

$search ='<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>

<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>';

$search = replacebr($search);

function replacebr($search){
        $offset=0;
        $anew=array();
        $asearch=array();
        $notdone = 1;
        $i=0;

    echo $search;

        while ($notdone == 1) {
            ($notdone = preg_match('/<table\s[^>]*>(.+?)<\/table>/', $search, $amatch, PREG_OFFSET_CAPTURE, $offset));
            if (count($amatch)>0){
echo "amatch: " ; var_dump($amatch);
                // add part before match
                $anew[] = substr($search,$offset,$amatch[0][1]-$offset);

echo "anew (before): " ; var_dump($anew[count($anew)-1]);
                // add match with replaced text
                $anew[] = str_replace("<br />","",$amatch[0][0]);
echo "anew (match): " ; var_dump($anew[count($anew)-1]);

                $offset += mb_strlen(substr($search,$offset,$amatch[0][1]-$offset))+ mb_strlen($amatch[0][0]);
echo "OFFSET: " ; var_dump($offset);

            }
            else{
                // add last part of string - we better be done
                $anew[] = substr($search, $offset);
                $search=="";
                if ($notdone == 1){
                    die("Error - should be done");
                }
            }
            if ($i==100){
                // prevent endless loop
                die("Endless Loop");
            }
            $i++;
        }
        $new = implode("",$anew);
            echo "*******************";
            echo $new;
        return $new;
    }


?>
Community
  • 1
  • 1
mseifert
  • 5,390
  • 9
  • 38
  • 100
0

Dont recommend to parse html with regex, but if you have to
this might work.

Note - the test case is in perl but the regex will work in php.
Just globally replace with $1

 #  '~(?s)((?:(?!\A|<table\b)\G|<table\b)(?:(?!<br\s*/>|</table\b).)*)<br\s*/>(?=.*?</table\b)~'

 (?s)                         # Dot-All
 (                            # (1 start), Keep these
      (?:
           (?! \A | <table \b )
           \G                           # Start match from end of last match
        |                               # or,
           <table \b                    # Start form '<table\b'
      )
      (?:                          # The chars before <br/ or </table  end tags
           (?!
                <br \s* /> 
             |  </table \b 
           )
           . 
      )*
 )                            # (1 end)
 <br \s* />                   # Strip <br/>
 (?= .*? </table \b )         # Must be </table end tag downstream

Perl test case

$/ = undef;

$str = <DATA>;

print "Before:\n$str\n\n";
$str =~ s~(?s)((?:(?!\A|<table\b)\G|<table\b)(?:(?!<br\s*/>|</table\b).)*)<br\s*/>(?=.*?</table\b)~$1~g;
print "After:\n$str\n\n";

__DATA__
<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>

Output >>

Before:
<p>Some text before table:</p><table cellpadding="0" cellspacing="0"><br /> <tbody><br />       <tr><br />          <td><br />          <p><strong>column1</strong></p>         </td><br />         <td><br />          <p><strong>column2</strong></p>         </td></tr><br />        <tr><br />          <td><br />          <p>1</p>            </td><br />         <td><br />          <p>2</p>            </td><br />         <br />      </tr><br /> </tbody><br /></table>

After:
<p>Some text before table:</p><table cellpadding="0" cellspacing="0"> <tbody>       <tr>          <td>          <p><strong>column1</strong></p>         </td>         <td>          <p><strong>column2</strong></p>         </td></tr>        <tr>          <td>          <p>1</p>            </td>         <td>          <p>2</p>            </td>               </tr> </tbody></table>