16

How determine if a string has been encoded programmatically in C#?

Lets for example string:

<p>test</p>

I would like have my logic understand that this value it has been encoded.. Any ideas? Thanks

Rod
  • 14,529
  • 31
  • 118
  • 230
GibboK
  • 71,848
  • 143
  • 435
  • 658

8 Answers8

61

You can use HttpUtility.HtmlDecode() to decode the string, then compare the result with the original string. If they're different, the original string was probably encoded (at least, the routine found something to decode inside):

public bool IsHtmlEncoded(string text)
{
    return (HttpUtility.HtmlDecode(text) != text);
}
Frédéric Hamidi
  • 258,201
  • 41
  • 486
  • 479
  • Good solution very elegant.... but what about if I do not want to compare two strings and I would like to know if the string is encoded? I appreciate your help on this thanks! – GibboK Dec 22 '10 at 10:01
  • AFAIK, that's what he said: `IsHtmlEncoded("<p>test</p>")` should give `true`, so it's encoded. – Aerus Dec 22 '10 at 10:05
  • 7
    @GIbboK: - I want to drink a cup of coffee. - Use a cup and coffee! - But what about if I do not want to use cup, coffee and drink it, but just want to have it in my stomach? – Michael Sagalovich Dec 22 '10 at 10:08
  • This is the answer that should be marked. Its programmatically the most logical solution. – Christopher Douglas Dec 23 '13 at 18:54
  • 8
    Very bad if you're trying to test user-entered values. If I enter `">&` then it'll decode `&`, the strings won't match, and it'll be counted as encoded. – Tarka Nov 17 '16 at 22:38
  • This is not valid for `CJK` strings. If you call `UrlDecode` on `CJK` strings it garbels them. – Mayank Aug 23 '17 at 00:37
  • 1
    Downvoted this because it simple is not effective in many cases. Tarka pointed out one scenario and I can confirm it. – onefootswill Oct 04 '18 at 01:35
  • The `HttpUtility.HtmlDecode` Method present in the `System.Web` Namespace – Venkat Jan 17 '19 at 10:01
  • What if I received the encoding from outside and don't have original string. Then what to compare with? – Rao khurram adeel Feb 20 '19 at 19:16
10

Strictly speaking that's not possible. What the string contains might actually be the intended text, and the encoded version of that would be &amp;lt;p&amp;gt;test&amp;lt;/p&amp;gt;.

You could look for HTML entities in the string, and decode it until there are no left, but it's risky to decode data that way, as it's assuming things that might not be true.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
4

this is my take on it... if the user passes in partially encoded text, this'll catch it.

private bool EncodeText(string val)
        {
            string decodedText = HttpUtility.HtmlDecode(val);
            string encodedText = HttpUtility.HtmlEncode(decodedText);

            return encodedText.Equals(val, StringComparison.OrdinalIgnoreCase);

        }
elvis
  • 312
  • 2
  • 12
2

I use the NeedsEncoding() method below to determine whether a string needs encoding.

Results 
-----------------------------------------------------
b               -->      NeedsEncoding = True
&lt;b>          -->      NeedsEncoding = True
<b>             -->      NeedsEncoding = True
&lt;b&lt;       -->      NeedsEncoding = False
&quot;          -->      NeedsEncoding = False

Here are the helper methods, I split it into two methods for clarity. Like Guffa says it is risky and hard to produce a bullet proof method.

    public static bool IsEncoded(string text)
    {
        // below fixes false positive &lt;<> 
        // you could add a complete blacklist, 
        // but these are the ones that cause HTML injection issues
        if (text.Contains("<")) return false;
        if (text.Contains(">")) return false;
        if (text.Contains("\"")) return false;
        if (text.Contains("'")) return false;
        if (text.Contains("script")) return false;

        // if decoded string == original string, it is already encoded
        return (System.Web.HttpUtility.HtmlDecode(text) != text);
    }

    public static bool NeedsEncoding(string text)
    {
        return !IsEncoded(text);
    }
Community
  • 1
  • 1
Ian G
  • 29,468
  • 21
  • 78
  • 92
0

A simple way of detecting this would be to check for characters that are not allowed in an encoded string, such as < and >.

Randam
  • 120
  • 7
0

All I can suggest is that you replace known encoded sections with the decoded string.

replace("&lt;", "<")
Mark
  • 1,759
  • 4
  • 32
  • 44
0

I'm doing .NET Core 2.0 development and I'm using System.Net.WebUtility.HtmlDecode, but I have a situation where strings being processed in a microservice might have an indeterminate number of encodings performed on some strings. So I put together a little recursive method to handle this:

    public string HtmlDecodeText(string value, int decodingCount = 0)
    {
        // If decoded text equals the original text, then we know decoding is done;
        // Don't go past 4 levels of decoding to prevent possible stack overflow,
        // and because we don't have a valid use case for that level of multi-decoding.

        if (decodingCount < 0)
        {
            decodingCount = 1;
        }

        if (decodingCount >= 4)
        {
            return value;
        }

        var decodedText = WebUtility.HtmlDecode(value);

        if (decodedText.Equals(value, StringComparison.OrdinalIgnoreCase))
        {
            return value;
        }

        return HtmlDecodeText(decodedText, ++decodingCount);
    }

And here I called the method on each item in a list where strings were encoded:

  result.FavoritesData.folderMap.ToList().ForEach(x => x.Name = HtmlDecodeText(x.Name));
David Spenard
  • 789
  • 7
  • 10
-1

Try this answer: Determine a string's encoding in C#

Another code project might be of help.. http://www.codeproject.com/KB/recipes/DetectEncoding.aspx

You could also use regex to match on the string content...

Community
  • 1
  • 1
Aim Kai
  • 2,934
  • 1
  • 22
  • 34