0

Whenever display text in an HTML document I always put it through htmlentities for a number of reasons. One of the reasons is that if the text contains HTML, I want the browser to display the HTML code, not render it.

The application I am writing requires that I still encode using htmlentities but hyper links need to be left alone.

Is there a way to do this efficiently using existing functions or do I need to implement this functionality?

emurano
  • 973
  • 6
  • 15
  • 1
    this is a conflicting statement: **I want the browser to display the markup, not render the HTML.** – bcosca Nov 14 '10 at 09:34
  • You should encode the HTML anyway and might be looking for an URL parser like http://stackoverflow.com/questions/1820870/php-complete-url-parser-help – Lekensteyn Nov 14 '10 at 09:38
  • 2
    @stillstanding I think by "I want the browser to display the HTML code, not render it", it is meant that he wants the HTML code to be shown "as is", that is, showing the `` on the screen as ``, but not want the browser to actually render the text inside as bold. – nonopolarity Nov 14 '10 at 10:09

2 Answers2

0

The usual way is to pass any "possibly harmful data" through htmlspecialchars() before showing it as part of a webpage. You can do that for user's comment, note, etc.

For any URL that users entered, you can show it on screen using htmlspecialchars(). The URL will be displayed on screen as it is. (any & will be escaped to &amp; but when shown on screen, it will become & again. Maybe your concern is when it is linked, as in <a href="______">text</a>, in which case you can escape the 4 characters: < > " ' because you don't want the &amp; to be further escaped into &amp;amp;, or you can use filter_var() to sanitize the url: http://us3.php.net/manual/en/function.filter-var.php

nonopolarity
  • 146,324
  • 131
  • 460
  • 740
0
  • You can roll your own format (or use bbcode, markdown or others).

  • You can parse HTML (using a proper library; not regex, please) and selectively keep all the <a> tags.

  • You can use regex to allow an HTML-like <a>-tag syntax, say in the form of

    <a href="..."[ rel="..."]>...</a>
    

    but keep in mind that it will not be HTML. (HTML allows rel to be specified before href, for starters.)

Also see this question; particularly the comments to my answer.

Community
  • 1
  • 1
aib
  • 45,516
  • 10
  • 73
  • 79
  • Thanks aib. I think writing a small parser is the way to go. And yes trust me, I won't be using regex improperly :) – emurano Nov 14 '10 at 20:33