How determine if a string has been encoded programmatically in C#?
Lets for example string:
<p>test</p>
I would like have my logic understand that this value it has been encoded.. Any ideas? Thanks
You can use HttpUtility.HtmlDecode() to decode the string, then compare the result with the original string. If they're different, the original string was probably encoded (at least, the routine found something to decode inside):
public bool IsHtmlEncoded(string text)
{
return (HttpUtility.HtmlDecode(text) != text);
}
Strictly speaking that's not possible. What the string contains might actually be the intended text, and the encoded version of that would be &lt;p&gt;test&lt;/p&gt;
.
You could look for HTML entities in the string, and decode it until there are no left, but it's risky to decode data that way, as it's assuming things that might not be true.
this is my take on it... if the user passes in partially encoded text, this'll catch it.
private bool EncodeText(string val)
{
string decodedText = HttpUtility.HtmlDecode(val);
string encodedText = HttpUtility.HtmlEncode(decodedText);
return encodedText.Equals(val, StringComparison.OrdinalIgnoreCase);
}
I use the NeedsEncoding()
method below to determine whether a string needs encoding.
Results
-----------------------------------------------------
b --> NeedsEncoding = True
<b> --> NeedsEncoding = True
<b> --> NeedsEncoding = True
<b< --> NeedsEncoding = False
" --> NeedsEncoding = False
Here are the helper methods, I split it into two methods for clarity. Like Guffa says it is risky and hard to produce a bullet proof method.
public static bool IsEncoded(string text)
{
// below fixes false positive <<>
// you could add a complete blacklist,
// but these are the ones that cause HTML injection issues
if (text.Contains("<")) return false;
if (text.Contains(">")) return false;
if (text.Contains("\"")) return false;
if (text.Contains("'")) return false;
if (text.Contains("script")) return false;
// if decoded string == original string, it is already encoded
return (System.Web.HttpUtility.HtmlDecode(text) != text);
}
public static bool NeedsEncoding(string text)
{
return !IsEncoded(text);
}
A simple way of detecting this would be to check for characters that are not allowed in an encoded string, such as < and >.
All I can suggest is that you replace known encoded sections with the decoded string.
replace("<", "<")
I'm doing .NET Core 2.0 development and I'm using System.Net.WebUtility.HtmlDecode, but I have a situation where strings being processed in a microservice might have an indeterminate number of encodings performed on some strings. So I put together a little recursive method to handle this:
public string HtmlDecodeText(string value, int decodingCount = 0)
{
// If decoded text equals the original text, then we know decoding is done;
// Don't go past 4 levels of decoding to prevent possible stack overflow,
// and because we don't have a valid use case for that level of multi-decoding.
if (decodingCount < 0)
{
decodingCount = 1;
}
if (decodingCount >= 4)
{
return value;
}
var decodedText = WebUtility.HtmlDecode(value);
if (decodedText.Equals(value, StringComparison.OrdinalIgnoreCase))
{
return value;
}
return HtmlDecodeText(decodedText, ++decodingCount);
}
And here I called the method on each item in a list where strings were encoded:
result.FavoritesData.folderMap.ToList().ForEach(x => x.Name = HtmlDecodeText(x.Name));
Try this answer: Determine a string's encoding in C#
Another code project might be of help.. http://www.codeproject.com/KB/recipes/DetectEncoding.aspx
You could also use regex to match on the string content...