0

Hoping this isn't a duplicate, I couldn't find an original question on the topic. If you have an area for users to input data, how do you store and retrieve the data without them inserting javascript or html?

As an example, say a user is making a forum post. They decide to write an html list or javascript function that runs when the post is viewed. How do you mitigate this when you receive their input on the server-side? Specifically a server'side of PHP.

  • Remove parts of their string data based on patterns?
  • Use an html tag around their entry like ?

Thanks

Paul
  • 139,544
  • 27
  • 275
  • 264
Spidy
  • 39,723
  • 15
  • 65
  • 83
  • See http://stackoverflow.com/questions/223480/html-encode-user-input-when-storing-or-when-displaying and http://stackoverflow.com/questions/3129899/what-are-the-common-defenses-against-xss and http://stackoverflow.com/questions/34896/when-is-it-best-to-sanitize-user-input – Josh Lee May 06 '11 at 04:16
  • None of those posts answer how to do it in php – Spidy May 06 '11 at 04:19
  • PHP, you say? http://stackoverflow.com/questions/tagged/xss+php; I was specifically looking for language agnostic questions, since you didn’t specify one. – Josh Lee May 06 '11 at 04:23

4 Answers4

3

All you have to do, going for the bare minimum, is replace < with &lt;.

jpsimons
  • 27,382
  • 3
  • 35
  • 45
2

You have to remove or translate the offending parts of their post. You can do it once as the post is coming in, and save the translated post in the database, or you can do it every time you display the post, and store the raw post in the database. Both approaches have their good and bad points.

As to how to strip the bad stuff, using simple matching to replace all < and > with &lt; and &gt; goes a long way -- but there's plenty more to do besides that.

Ernest Friedman-Hill
  • 80,601
  • 10
  • 150
  • 186
  • Is there a list somewhere of items to replace to protect the site? Is it just < and > – Spidy May 06 '11 at 04:21
  • What more to do is there? No HTML gets through if you escape the less-than sign. – jpsimons May 06 '11 at 04:22
  • On *when* to do it: my preference is to store the raw post and translate/encode on later display. But maybe that's because I often work on systems where the data has to be accessible for other (non-HTML) purposes. – nnnnnn May 06 '11 at 06:39
2

I use HTML Purifier to strip out the bits I don't want and leave in the bits I do. The default rules are pretty good, but it offers enormous flexibility if you need it.

El Yobo
  • 14,823
  • 5
  • 60
  • 78
2

There are lots of tutorials out there on preventing code injections. Microsoft's is pretty comprehensive found here.

For html injects depending on how thorough you want to be you can usually just put in a string parser to check for <> and remove them without given exceptions.

Daniel Nill
  • 5,539
  • 10
  • 45
  • 63
  • sorry, didn't see you were working in php, but googling "preventing javascript php injections" brings up a handful of SO links so be assured there is plenty of info out there. – Daniel Nill May 06 '11 at 04:29