2

I know this topic has been discussed quite extensively, as I've gone through and read more than 15 posts on the subject, but still can't find an answer to my question.

I'm looking for a function to sanitize data from a form. As absolutely NO HTML will be acceptable, how do I go about escaping ALL html entities so the user can absolutely not inject anything? I don't need a white list, as no input HTML is allowed.

Also, there's no need to run the mysql_real_escape_string, as I don't utilize a MySQL database. I use MongoDB. I'm just storing first name, last name, phone numbers, basic stuff. No HTML. But I still don't want a user to be able to input <script>whatever</script> for their first name, and when it's displayed back to them, it parses it.

I thought about HTML Purifier, and htmLAWED but they seem to be too much for what I'm trying to do. Do I just build a fancy preg_replace function?

mrc0der
  • 115
  • 1
  • 11
  • Well isn't this why you should be sanitizing stuff that the user has provided before displaying in HTML? Namely something like: – thatidiotguy Oct 25 '12 at 15:00

7 Answers7

2

There is no universal "make it safe" filter. Strings are only dangerous when placed into a specific context.

For example, if the context is a plain text document, you don't really have any worries.

htmlspecialchars is enough if the context is a text node(not within angle brackets). Specify the correct charset/encoding, which is the charset/encoding in the http headers sent by your server.

ok

   <p><?= htmlspecialchars($input, ENT_QUOTES, 'UTF-8'); ?></p>

But, if you need to output inside of angle brackets, making the context something like html attributes, like:

<p <?= htmlspecialchars($input, ENT_QUOTES, 'UTF-8'); ?>   ></p>
or
<p title="<?= htmlspecialchars($input, ENT_QUOTES, 'UTF-8'); ?>"   ></p>

The "make it safe" task, in many cases, becomes extremely difficult(legacy browsers have some absolutely bewildering bugs that defy common expectations of software developers). You would be foolish to not stand on the shoulders of giants and make use of something like htmlpurifier.

goat
  • 31,486
  • 7
  • 73
  • 96
  • "legacy browsers have some absolutely bewildering bugs" can you name one that exists and its browser version? I test my companies e-commerce site down to ie6 and browsers from that era so I am quite interested to know this. EDIT: in relation to htmlspecialchars of course :) – Sammaye Oct 25 '12 at 16:35
  • @Sammaye some time in the past there was ha.ckers.org/xss.html which included browser version. It now redirects to https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet which doesn't list browser versions anymore. maybe try waybackmachine.org to find it. my jaw dropped the first time i saw the dword encoding... – goat Oct 25 '12 at 16:48
  • Damn I gotta try some of these, thanks! Though htmlpurifier is such a slow and cpu consuming library :(, it is so hard to use it for all input. – Sammaye Oct 25 '12 at 16:52
0

I'm no expert on such things, but couldn't you just str_replace the angle brackets?

Zach Lysobey
  • 14,959
  • 20
  • 95
  • 149
0

I would say use preg_replace but you'd need to be careful of accents and other uncommon characters that can appear in a person's name.

aaron-bond
  • 3,101
  • 3
  • 24
  • 41
0

Define sanitize: Do you want to escape the angle brackets or do you want to remove HTML tags?

To escape take look at

htmlentities() 

To remove have a look at

strip_tags()
mineichen
  • 470
  • 2
  • 11
  • is strip_tags() enough to stop XSS? – mrc0der Oct 25 '12 at 15:07
  • No `strip_tags` has got a few flaws with unicode encoding if I remember right, some would argue the only way to truely stop XSS is to not attempt to remove but instead encode as `htmlspecialchars` or `htmlentities` does – Sammaye Oct 25 '12 at 15:10
  • Why don't combine these two methods? First strip all Tags and to be shure run htmlentities() over it just to be on the save side... – mineichen Oct 25 '12 at 15:12
0

One I like to use that just formats ALL HTML special chars in such a way that removes them from the flow of the HTML page is:

htmlspecialchars($string);

It's never let me down yet, solves having to use complex and slow replacment functions etc and also it means the user can use > in their username or comment etc without it being removed (i.e. a very valid username in the internet is >3).

Sammaye
  • 43,242
  • 7
  • 104
  • 146
0

What about looking into PHP's Data Filtering, http://php.net/manual/en/book.filter.php

Sanatize: http://php.net/manual/en/filter.filters.sanitize.php

If you really want a solid and safe library, check out OWASP's ESAPI for PHP

Don’t write your own security controls! Reinventing the wheel when it comes to developing security controls for every web application or web service leads to wasted time and massive security holes. The OWASP Enterprise Security API (ESAPI) Toolkits help software developers guard against security‐related design and implementation flaws.

Anthony Hatzopoulos
  • 10,437
  • 2
  • 40
  • 57
0

Use php 5.3's filter_input http://php.net/manual/en/function.filter-input.php

$string = filter_input(INPUT_POST, 'string', FILTER_SANITIZE_SPECIAL_CHARS);

This is pretty much like $_POST['string'] but with built in cleaner.

Johndave Decano
  • 2,101
  • 2
  • 16
  • 16