I'm using .NET 4.5 and I'm trying to parse a URI query string into a NameValueCollection
. The right way seems to be to use HttpUtility.ParseQueryString(string query)
which takes the string obtained from Uri.Query
and returns a NameValueCollection
. Uri.Query
returns a string that is escaped according to RFC 2396, and HttpUtility.ParseQueryString(string query)
expects a string that is URL-encoded. Assuming RFC 2396 and URL-encoding are the same thing, this should work fine.
However, the documentation for ParseQueryString
claims that it "uses UTF8 format to parse the query string". There is also an overloaded method which takes a System.Text.Encoding
and then uses that instead of UTF8.
My question is: what does it mean to use UTF8 as the encoding? The input is a string
, which by definition (in C#) is UTF-16. How is that interpreted as UTF-8? What is the difference between using UTF8 and UTF16 as the encoding in this case? My concern is that since I'm accepting arbitrary user input, there might be some security risk if I botch the encoding (i.e. the user might be able to slip through some script exploit).
There is a previous question on this topic (How to parse a query string into a NameValueCollection in .NET) but it doesn't specifically adress the encoding problem.