First off, I have been doing a lot of reading/researching on this topic, but I am still a bit confused as to what the best practice for this is.
I have checked AND READ ALL of the following very helpful and informative pages on the subject:
https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet
http://msdn.microsoft.com/en-us/library/ms437314.aspx
how to encode href attribute in HTML
HttpServerUtility.UrlPathEncode vs HttpServerUtility.UrlEncode
This is my setup (the content variable will later be rendered with Html.Raw()
):
content += "<a class=\"contentLink\" href=\"" + subRow.linkHref + "\" target=\"_blank\">" + subRow.linkText + "</a>";
The encoding of subRow.linkText
is simple enough (just a simple HtmlEncode method will secure that), however I am confused, as others have been, on how to encode the href attribute given the resources (shown in above links) and best practices of today.
UPDATE: All of what is inserted into the 'href' attribute is user-input. I want it this way so that they can either path to something they post on Google Drive or any other related site or even a relational path (programmatically constructed) to an internal .pdf file or picture, should they choose.
I do get the differences between the methods I have available to me, but I am not sure which or how many I should apply and in what order? Should I even use HttpUtility.HtmlAttributeEncode
?
The context of this question is from the perspective of wanting my site not to break and, of course, preventing XSS.
UPDATE:
I attempted to test inserting javascript into various portions of a url using the user-input that will later be used in the href
attribute, and I have noticed a couple of oddities.
I am currently testing with this encoding setup:
content += "<a class=\"contentLink\" href=\"" + HttpUtility.HtmlEncode(HttpUtility.UrlPathEncode(subRow.linkHref)) + "\" target=\"_blank\">" + HttpUtility.HtmlEncode(subRow.linkText) + "</a>";
In effect, first I am url encoding (with UrlPathEncode), then I am HTML encoding. I believe this may be the correct method as the HTML encoded text will hit the DOM and should still render fine as a URL (I think).
However, as I stated, I have noticed a couple of oddities.
- I used this as the user input:
http://localhost:10226/home.cshtml?javascript:var a = "hi"; alert(a); void(0);
and no javascript executes, but I am not convinced it is necessarily because of my encoding (that is, I could see advanced browsers no longer allowing javascript to be ran from the url, as it is a huge security hole and bad practice, in general, from my understanding, but, of course, I can't bank on this). - After clicking the link that shows up using this test user-input the address bar reads:
http://localhost:10226/home.cshtml?javascript:var%20a%20=%20%22hi%22;%20alert(a);%20void(0);
And this is where I get a little confused. From the research shown in the links above,UrlPathEncode
is supposed to ignore encoding after the?
, however you can clearly see that it%
encoded the spaces in the query string portion of this url. This is a good thing, I suppose, but not consistent with what I understand of the documentation.
I suppose I am still at a loss, but every local and external links I have tried have neither been broken nor dangerous that I can tell, so I will continue to use this until my understanding of this is otherwise clarified.