1

I wanna check the encoding in a string with C#.

Is there any possible way?
I was trying with stream Reader but I don't have path

foreach (string um in userMasterList)
{
    counter++;
    TextInfo textInfo = new CultureInfo("en-US", false).TextInfo;
    string finalName = null;

    if (!string.IsNullOrEmpty(um))
    {
        //string str =  StreamReader.CurrentEncoding.ToString();
        string Name = um.Trim();
        //StreamReader sr = new StreamReader(Name);
        // MessageBox.Show(sr.CurrentEncoding.ToString());
        if (!Regex.IsMatch(Name, "^[a-zA-Z0-9]*$"))
        {
            finalName = GreekToLower(Name);
            finalName = textInfo.ToTitleCase(finalName);
        }
        else
        {
            finalName = textInfo.ToTitleCase(Name.ToLower());
        }
        finalList.Add(finalName);
    }
    else
    {
        finalList.Add("-");
    }
}
GSerg
  • 76,472
  • 17
  • 159
  • 346
  • 4
    .NET strings are Unicode. There's no encoding to talk about. So are Windows strings - Windows is natively Unicode since Windows NT back in 1994. – Panagiotis Kanavos Mar 12 '21 at 10:42
  • [UTF-16 Unicode](https://www.joelonsoftware.com/articles/Unicode.html), that is. – GSerg Mar 12 '21 at 10:43
  • 4
    "I wanna check the encoding in a string with C#" - this is a little like asking whether an `int` is in decimal or hex. The concept just doesn't apply. – Jon Skeet Mar 12 '21 at 10:45
  • 1
    The code you posted doesn't deal with encodings at all. If you have an encoding problem, it was caused when the strings were loaded from a file or database. Encoding problems are cause when non-Unicode files are read using the wrong codepage – Panagiotis Kanavos Mar 12 '21 at 10:45
  • I can write `Αυτό Εδώ` knowing that SO, an ASP.NET site storing text in SQL Server using Unicode fields (nvarchar) can display the text without any problems. There are a lot of questions with Chinese or Japanese text. None of these required specialized handling – Panagiotis Kanavos Mar 12 '21 at 10:49
  • Until `Utf8String` lands in .NET "some future version", the answer to this is simple: "it is UTF-16, not UTF-8"; and even after that, `string` and `Utf8String` won't be interchangeable, so there won't be any confusion: you'll always know what you're using. Beyond that: encoding only matters when reading/writing strings to byte-streams – Marc Gravell Mar 12 '21 at 10:52
  • Most likely you want something like this: https://stackoverflow.com/questions/19519685/detect-encoding-of-byte-array-c-sharp – Thomas Koelle Mar 12 '21 at 10:53
  • What's the actual problem? I can say that since 2002 when .NET 1.0 came out, encoding problems in Greek apps and sites have gone away *unless* the developer explicitly broke the application by trying to "fix" what didn't need fixing. Or tried to use non-Unicode data with the wrong codepage. – Panagiotis Kanavos Mar 12 '21 at 10:54
  • @ThomasKoelle the answer there is `In short, no.`. `StreamReader` [already tries to detect Unicode](https://referencesource.microsoft.com/#mscorlib/system/io/streamreader.cs,133) but that's it. There's no reliable way to detect the codepage without a BOM. You can try reading the entire file using different codepages and discard those that produce error characters, but you can't be sure which codepage is the correct one. – Panagiotis Kanavos Mar 12 '21 at 10:57
  • What is the actual problem? Are you trying to recover mangled text caused by buggy file reading code? Fix the bug instead. To reverse a bad codepage conversion you have to *guess* not only what the original codepage was but what the *wrong* codepage was. You can't automate this - some conversions will result in `�`, the Unicode replacement character which tells you the combination was wrong. Others produce garbled text. You can discard those only if you have an idea of the file contents and what characters or words are expected. Otherwise you have to test N^2 combinations and inspect results – Panagiotis Kanavos Mar 12 '21 at 11:12
  • Guys my issue is that the "Name" is read from the DB and its in ANSI I want to detect if its encoded in UTF-8 or in ANSI Any help? – Elias Shamoun Mar 12 '21 at 11:46
  • 1
    You need to provide a *lot* more information about *exactly* what you mean. How are you reading it from the database? What *exactly* do you mean by "its in ANSI"? What database are you using? – Jon Skeet Mar 12 '21 at 12:00
  • ANSI means the encoding is default ADAM ALΕΧΑNDER Copy and paste this name on your notepad++ and change the encoding from UTF-8 to ANSI you'll see that it contains chars like this: ADAM ALΕΧΑNDER So I wanna detect this that's my task – Elias Shamoun Mar 12 '21 at 12:11
  • @EliasShamoun When I do that on my system, I see ADAM ALО•О§О‘NDER which is different from your system. As explained above, the strings in C# are UTF-16 Unicode, period. After you have put the value in a string, it is already late to fix it. You must fix the thing that reads the value from the database, by telling it the correct encoding in which it must perform the reading (and answer to that is not "ANSI"). – GSerg Mar 12 '21 at 12:28
  • yes its not ANSI its 1252 encoding I searched that string Name = um.Trim(); string text1252 = encoding.GetString(encoding.GetBytes(Name)); if (Name.Equals(text1252, StringComparison.Ordinal)) { } else { Convert.ToBase64String(Encoding.UTF8.GetBytes(Name)); } – Elias Shamoun Mar 12 '21 at 12:38
  • @EliasShamoun That code makes no sense. `text1252` is equal to `Name`, and at no point encoding to codepage 1252 occurs. All strings are Unicode UTF-16. Please fix the database reader, not the strings. – GSerg Mar 12 '21 at 13:10

1 Answers1

2

Not sure whether this will answer your question:

 public static bool CheckEncoding(string value, Encoding encoding)
 {
     bool retCode;
     var charArray = value.ToCharArray();
     byte[] bytes = new byte[charArray.Length];
     for (int i = 0; i < charArray.Length; i++)
     {
         bytes[i] = (byte)charArray[i];
     }
     retCode = string.Equals(encoding.GetString(bytes, 0, bytes.Length), value, StringComparison.InvariantCulture);
     return retCode;
}

Calling Code:

CheckEncoding("Prüfung", Encoding.ASCII); //false
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 14 '22 at 02:11