0

I'm creating an app that retrieves the text within a tweet, store it in the database and then display it on the browser. The problem is that I'm thinking if the text has PHP tags or HTML tags it might be a security breach there.

I looked into strip_tags() but saw some bad reviews. I also saw suggestions to HTML Purifier but it was last updated years ago.

So my question is how can I be 100% secure that if the tweet text is "<script> something_bad() </script>" it won't matter?

To state the obvious the tweets are sent to the database from users so I don't want to check all individually before displaying them.

Sinister Beard
  • 3,570
  • 12
  • 59
  • 95
Micael Dias
  • 331
  • 2
  • 12
  • HTML markup isn't dangerous to your *database* (though other content might be). The key is to sanitize user-supplied content in a contextually-appropriate way. Every context requires its own approach. – Pointy Mar 18 '15 at 14:00
  • php executes on the server. if you have some php code in a string, and do `echo $string_with_php`, then NOTHING will happen. it's just some text and will NOT get executed. browsers have no idea what php is, and don't care. – Marc B Mar 18 '15 at 14:00
  • To prevent XSS, you shouldn't worry about `` in the database, but about `` and not sanitizing the values before sending them to the browser – Alvaro Montoro Mar 18 '15 at 14:04
  • And how can I sanitize it @Alvaro ? – Micael Dias Mar 18 '15 at 14:08
  • That would depend where you are going to write the text: directly on the page, on a tag attribute, as CSS... Check this link: https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet and read about how to sanitize HTML markup – Alvaro Montoro Mar 18 '15 at 14:11
  • `htmlentities` on output (or input...) – ʰᵈˑ Mar 18 '15 at 14:13
  • Need any more help with this? If so I'll update my answer. – SilverlightFox May 09 '15 at 10:39

4 Answers4

1

You are NEVER 100% secure, however you should take a look at this. If you use ENT_QUOTES parameter too, currently there are no ways to inject ANY XSS on your website if you're using valid charset (and your users don't use outdated browsers). However, if you want to allow people to only post SOME html tags into their "Tweet" (for example <b> for bold text), you will need to take a deep look at EACH whitelisted tag.

Eda190
  • 669
  • 1
  • 7
  • 20
1

You've passed the first stage which is to recognise that there is a potential issue and skipped straight to trying to find a solution, without stopping to think about how you want to deal the scenario of the content. This is a critical pre-cusrsor to solving the problem.

The general rule is that you validate input and escape output

validate input - decide whether to accept or reject it it in its entirety)

if (htmlentities($input) != $input) {
    die "yuck! that tastes bad";
}

escape output - transform the data appropriately according to where its going.

If you simply....

print "<script> something_bad() </script>";

That would be bad, but....

print JSONencode(htmlentities("<script> something_bad() </script>"));

...then you'd would have done something very strange at the front end to make the client susceptivble to a stored XSS attack.

symcbean
  • 47,736
  • 6
  • 59
  • 94
1

If you're outputting to HTML (and I recommend you always do), simply HTML encode on output to the page.

As client script code is only dangerous when interpreted by the browser, it only needs to be encoded on output. After all, to the database <script> is just text. To the browser <script> tells the browser to interpret the following text as executable code, which is why you should encode it to &lt;script&gt;.

The OWASP XSS Prevention Cheat Sheet shows how you should do this properly depending on output context. Things get complicated when outputting to JavaScript (you may need to hex encode and HTML encode in the right order), so it is often much easier to always output to a HTML tag and then read that tag using JavaScript in the DOM rather than inserting dynamic data in scripts directly.

At the very minimum you should be encoding the < & characters and specifying the charset in metatag/HTTP header to avoid UTF7 XSS.

SilverlightFox
  • 32,436
  • 11
  • 76
  • 145
0

You need to convert the HTML characters <, > (mainly) into their HTML equivalents &lt;, &gt;.

This will make a < and > be displayed in the browser, but not executed - ie: if you look at the source an example may be &lt;script&gt;alert('xss')&lt;/script&gt;.

Before you input your data into your database - or on output - use htmlentities().

Further reading: https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet

ʰᵈˑ
  • 11,279
  • 3
  • 26
  • 49
  • Removing the `<` and `>` is not enough to prevent all types of XSS. What if the text is shown as the alt of an image and I do `good_text" onclick="something_bad()`? – Alvaro Montoro Mar 18 '15 at 14:18
  • OP is receiving tweets. If OP converts `<` and `>` into their HTML entity codes, then he doesn't have to worry about the alt text (for this scenario) as he won't be inserted untrusted data into an `` tag – ʰᵈˑ Mar 18 '15 at 14:19
  • That's an assumption that is not specified in the question, and that could be misleading to people seeing this thread in the future – Alvaro Montoro Mar 18 '15 at 14:23