44

When encoding possibly unsafe data, is there a reason to encode >?

  • It validates either way.
  • The browser interprets the same either way, (In the cases of attr="data", attr='data', <tag>data</tag>)

I think the reasons somebody would do this are

  • To simplify regex based tag removal. <[^>]+>? (rare)
  • Non-quoted strings attr=data. :-o (not happening!)
  • Aesthetics in the code. (so what?)

Am I missing anything?

700 Software
  • 85,281
  • 83
  • 234
  • 341

6 Answers6

37

Strictly speaking, to prevent HTML injection, you need only encode < as &lt;.

If user input is going to be put in an attribute, also encode " as &quot;.

If you're doing things right and using properly quoted attributes, you don't need to worry about >. However, if you're not certain of this you should encode it just for peace of mind - it won't do any harm.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • 3
    **Security Warning:** This answer is incorrect. For a basic example, `'` is an acceptable attribute quote mark and not escaping it in such an attribute is an attack vector. There are also other attack vectors depending on the context. – Alexander O'Mara Jan 17 '16 at 22:43
  • 1
    It is true that `'` could be used instead of `"` for attribute quotation. In fact, it is possible to add attributes with no quotation marks at all. The developer should understand his application without making assumptions. In my case, all attributes are quoted using the latest standard `"` so this answer was correct for me. – 700 Software Nov 10 '16 at 17:22
  • You also must escape `&` as `&`. – Nayuki Jan 13 '21 at 21:59
  • It is unclear why you 'must' escape & – PJUK Jan 13 '23 at 14:07
16

The HTML4 specification in its section 5.3.2 says that

authors should use "&gt;" (ASCII decimal 62) in text instead of ">"

so I believe you should encode the greater > sign as &gt; (because you should obey the standards).

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • 1
    It is good to attempt to obey the standards where possible - but we all know that it is impossible to obey standards, and get your site to work on all (and I obviously mean IE6) browsers. So, common sense is permitted in certain circumstances - and if you can make something that works on all existing browsers, and you expect to work on all future browsers, and is common practice - then I am not sure that it is necessary to dogmatically follow standards. – Billy Moon Jan 25 '12 at 21:45
  • 2
    But in the original poster's case, it is possible, and simple, to obey the standard. Why should he do something against them when he can avoid that? – Basile Starynkevitch Jan 25 '12 at 21:48
  • 4
    The standard says SHOULD, not MUST. And more specificylly: "...to avoid problems with older user agents". That means, if you don't target pre-1999 browsers, you need to do nothing. – user123444555621 Jan 25 '12 at 22:13
5

Current browsers' HTML parsers have no problems with uquoted >s

However, unfortunately, using regular expressions to "parse" HTML in JS is pretty common. (example: Ext.util.Format.stripTags). Also poorly written command line tools, IDEs, or Java classes etc. may not be sophisticated enough to determine the limiter of an opening tag.

So, you may run into problems with code like this:

<script data-usercontent=">malicious();//"></script>

(Note how the syntax highlighter treats this snippet!)

Community
  • 1
  • 1
user123444555621
  • 148,182
  • 27
  • 114
  • 126
  • Of course, depending on circumstances, you might actually want to do this on purpose to break amateur attempts at parsing your content (see https://xkcd.com/859/) – Niet the Dark Absol Nov 18 '15 at 18:41
0

Yes, because if signs were not encoded, this allows xss on forms social media and many other because a attacker can use <script> tag. If you parse the signs the browser would not execute it but instead show the sign.

coder
  • 55
  • 7
0

Always

This is to prevent XSS injections (through users using any of your forms to submit raw HTML or javascript). By escaping your output, the browser knows not to parse or execute any of it - only display it as text.

This may feel like less of an issue if you're not dealing with dynamic output based on user input, however it's important to at least understand, if not to make a good habit.

leemeichin
  • 3,339
  • 1
  • 24
  • 31
-3

Encoding html chars is always a delicate job. You should always encode what needs to be encoded and always use standards. Using double quotes is standard, and even quotes inside double quotes should be encoded. ENCODE always. Imagine something like this

<div> this is my text an img></div>

Probably the img> will be parsed from the browser as an image tag. Browsers always try to resolve unclosed tags or quotes. As basile says use standards, otherwise you could have unexpected results without understanding the source of errors.

albanx
  • 6,193
  • 9
  • 67
  • 97
  • *"Probably the img> will be parsed from the browser as an image tag"*, I think not. – 700 Software Jul 11 '13 at 19:16
  • so you think not, really do you think? – albanx Jul 11 '13 at 21:40
  • Well, [let's see what other people think](http://stackoverflow.com/questions/17685535/would-the-browser-ever-try-to-parse-img). – 700 Software Jul 16 '13 at 19:41
  • 1
    I am sorry if I am being annoying. The reason for my downvote was your answer made an inaccurate statement. (that is what I think) The reason for my question was that it appeared you and I disagreed. If I were wrong about something, I would want someone to tell me, so I posted the question to find out for sure. – 700 Software Jul 16 '13 at 20:06
  • @GeorgeBailey (I do not know why I am returning to this but...) My answer does not make any inaccurate statement, it is correct now days and it is not even different from the selected answer with 18 upvotes. So you posted a question knowing the answer . – albanx Jan 19 '17 at 10:13
  • (5 years later ☺) *"Probably the img> will be parsed from the browser as an image tag."* is not accurate. I doubt you'll be able to prove otherwise at this point. – 700 Software Jan 19 '17 at 12:25
  • 1
    @GeorgeBailey You question is about "Encoding or Not" and my answer is "Yes you should", _Probably the img> will be parsed from the browser as an image tag._ is accurate, because you do not know the what browser parsing engine will do for each browser. I doubt you write the browser code or tested this on every browser every version. So that is why it is **Probably** and not **sure** – albanx Jan 19 '17 at 12:31
  • Probably=not accurate. Possibly=accurate. – 700 Software Jan 19 '17 at 12:34
  • If you are referring to ancient or poorly designed browsers that a) almost nobody uses and b) are known to be insecure, then I guess Probably could be accurate then; (though I doubt it) but that is not *practically* relevant to modern web development. – 700 Software Jan 19 '17 at 12:35
  • 1
    @GeorgeBailey again you're in error simply because you're out of your question context and out of standard specs. _ Probably_ is much correct then you believe – albanx Jan 19 '17 at 12:49
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/133554/discussion-between-george-bailey-and-albanx). – 700 Software Jan 19 '17 at 12:51