0

I want to remove HTML tags, also contents of style and script tag but my code is not removing style tag contents, don't know why. any idea about this ?

$search = array('@<script[^>]*?>.*?</script>@si',  // Strip out javascript 
               '@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags 
               '@<style[^>]*?>.*?</style>@si',    // Strip style tags properly 
               '@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments including CDATA 
               ); 

$htmlstring = 'Which brand(s) of single serve coffee brewer do you own? <style type="text/css"> #answer67627X49X1159other {display:none;}</style>';
$htmlstring .= '<style> #answer67627X49X1159999 {display:none;}</style><script>alert(123);</script>';

$htmlstring = preg_replace($search,'',$htmlstring);

echo '<input style="width:90%" type="text" value="'.$htmlstring.'" />';

Following is the output in input tag.

Which brand(s) of single serve coffee brewer do you own? #answer67627X49X1159other {display:none;} #answer67627X49X1159999 {display:none;}

Riz
  • 703
  • 1
  • 7
  • 16
  • 1
    1) [Regex cannot handle this properly](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) 2) [Have a look at this library](http://simplehtmldom.sourceforge.net/). It should make your problem quite trivial (have a look at the examples) – Martin Ender Nov 07 '12 at 11:54
  • why not just use strip_tags($htmlstring); – Marty Nov 07 '12 at 11:55
  • 1
    @Marty: strip_tags remove HTML tags but not contents of style and script tags. – Riz Nov 07 '12 at 11:56

2 Answers2

0

the pattern order is bad

<?php
$search = array('@<script[^>]*?>.*?</script>@si',  // Strip out javascript 
               '@<style[^>]*?>.*?</style>@si',
               '@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags 
               '@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments including CDATA 
               ); 

$htmlstring = 'Which brand(s) of single serve coffee brewer do you own? <style type="text/css"> #answer67627X49X1159other {display:none;}</style>';
$htmlstring .= '<style> #answer67627X49X1159999 {display:none;}</style><script>alert(123);</script>';

$htmlstring = preg_replace($search, '' ,$htmlstring);
var_dump($htmlstring);

// string(57) "Which brand(s) of single serve coffee brewer do you own? "
Ene
  • 464
  • 2
  • 7
0

You've already stripped the html tags before you get to the style tags. Change the order of your replacements, so that script and style are handle before the rest

$search = array('@<script[^>]*?>.*?</script>@si',  // Strip out javascript                
                '@<style[^>]*?>.*?</style>@si',    // Strip style tags properly 
                '@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags 
                '@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments including CDATA 
           ); 
Crisp
  • 11,417
  • 3
  • 38
  • 41