Reading the scary doc, I know that if I provide the wrong arguments to dangerouslySetInnerHTML(), my trousers are down for XSS. What must I do upstream of this function call to be sure that I can use it safely? Look for and strip <script> tags from user input? What else?
-
Where does the html originate? – demux Mar 04 '16 at 16:06
-
rule of the thumb, avoid using dangerouslySetInnerHTML() if possible. look for other alternatives. – Mox Mar 04 '16 at 16:41
-
1A great example and use case would be the result of [linkify-string](http://soapbox.github.io/linkifyjs/docs/linkify-string.html). It contains HTML but is safe to use, because it escapes HTML input and then processes the links. – Prinzhorn Mar 06 '16 at 09:45
1 Answers
CAVEAT: I am not a security expert; the following summarizes the best understanding I have accumulated as a working layman.
The best way to be sure your "dangerous" inner HTML is safe is to make sure you only ever set it to HTML that you have generated yourself. In other words, you never display any content that has come from an outside source. That probably sounds too strict, but there's a workaround: if you want to include "tainted" content in your dangerous HTML, you can parse the tainted content and re-generate it. The basic idea is that your parser only recognizes legitimate inputs, and ignores everything else. It then takes the parsed input, and generates safe outputs.
For example, let's say we have the following rules:
- A string is any sequence of A-Z, a-z, 0-9, and/or the punctuation marks period, comma, semicolon, colon, question mark, and exclamation point.
- A styled string is something like [b]bold[/b], [u]underline[/u], etc.
- Everything else is ignored.
Notice you're not blacklisting things like script tags, because you might not know everything that needs to be blacklisted. Instead, you're whitelisting certain specific things that you know are safe, and ignoring everything else. Once you're done parsing the input, you've got a list of known-safe strings and styled strings, and it's relatively straightforward to generate safe HTML output with embedded tags for styling.
Links and image tags are more difficult to handle safely, since any link/image could lead to malware, or to an innocuous-looking site that redirects to malware after a day or so. The best way I know of to be safe with images is to require them to be uploaded to a server equipped with virus scanners (which are not 100% foolproof either). For links, the best approach I can think of is to be sure that the actual link text is displayed along with the text you're linking. But I would still use the same approach: write a parser that knows how to parse safe URLs (for links or images), and does NOT know how to parse unsafe URLs, then regenerate the link/image from the parsed data. That's still a lot riskier than just displaying styled text, but if you need links/images, that's the best way I know of.

- 556
- 2
- 9