I need to validate user input for an href
on the server side and need to make sure only http://
and https://
are allowed as a protocol (if specified at all.) The objective is to eliminate possible malicious code like javascript:...
or anything alike.
What makes it difficult is the number of ways the colon could be encoded in such string e.g. :
, :
, :
, :
, :
. I'd like to transform the value and see it as the browsers do before they render the page.
One option could be building a DOM document using AngleSharp as it does the perfect job when parsing attributes. Then I could retrieve the value and validate it but it seems somewhat of an overkill to build the whole DOM tree just to parse one value. Is there a way to use AngleSharp to parse just an attribute value? Or is there a lib which I could use just for this task?
I also found this question, but the method used in there does not really parse the URIs the way browsers do.