4

Question: How do I strip HTML tags but allow the greater and less-than sign using PHP?

If I used PHP's strip_tags() function, it doesn't quite work:

$string = '<p>if A > B</p>'
echo strip_tags($string);  // if A B
// but I want to output "if A > B"

UPDATE

Basically, I only want to allow/display plain text.

TimTim
  • 3,191
  • 4
  • 18
  • 8
  • 13
    You know you shouldn't have < and > in HTML anyway? You should use character entities instead such as < and > - the browser will render them as < and > – Tamas Czinege Jan 03 '10 at 23:01
  • DrJokepu is correct. Your snippet is invalid HTML. – SpliFF Jan 03 '10 at 23:04
  • @DrJokepu, so if I use htmlspecialchars(), that encodes the > to > but doesn't strip tags. Basically, I only want to allow plain-text. What's the simplest way to do that? – TimTim Jan 03 '10 at 23:52
  • @SpliFF - no it isn't invalid. Add a doctype and a title element, and try it via the direct input box at http://validator.w3.org/check. – Alohci Jan 04 '10 at 01:21
  • 1
    Related question: http://stackoverflow.com/questions/1996344/is-preventing-xss-and-sql-injection-as-easy-as-does-this IMHO just leave `strip_tags()` aside and go ahead with `htmlspecialchars()`. No need to exagerrate this. – BalusC Jan 04 '10 at 01:50

5 Answers5

4

You can use HTML Purifier this will not only work with the <p>if A > B</p> example which you wrote, but also the example <p>1<2 && 6>4</p> written by DrJokepu.

Given the input <p>1<2 && 6>4</p> with the allowed elements set to none, HTML purifier gives the output: 1&lt;2 &amp;&amp; 6&gt;4.

Community
  • 1
  • 1
Tommy Andersen
  • 7,165
  • 1
  • 31
  • 50
0

This will strip everything that looks like an HTML tag.

htmlentities(preg_replace('/<\\S.*?>/', '', $text));
amphetamachine
  • 27,620
  • 12
  • 60
  • 72
  • 2
    This will fail on

    1<2 && 6>4

    - see http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
    – Tamas Czinege Jan 03 '10 at 23:11
0

Unfortunately the simplest and most reliable way to get this working is to use an HTML parser. This one will do the trick. I don't know if it'll handle HTML fragments like the above. If not, then wrapping to make it acceptable HTML should be trivial.

As others are pointing out, parsing HTML with a regexp has numerous edge cases to cater for, and difficulty, since HTML is not regular.

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
0

Try this regular expression that I wrote: <([^>]?="(\"|[^"])?")?([^>]?=''(\''|[^''])?'')?[^>]*?>

shellster
  • 1,091
  • 1
  • 10
  • 21
0

Use:

<p><?php echo htmlspecialchars("if A > B") ?></p>

(of course you can use any input instead of literal string)

htmlspecialchars() converts plain text to HTML text, preserving < and >.

Kornel
  • 97,764
  • 37
  • 219
  • 309