1

I have a Base64 string which I want to convert and decode to UTF-8 like this:

byte[] encodedDataAsBytes = System.Convert.FromBase64String(vcard);
return Encoding.UTF8.GetString(encodedDataAsBytes);

This because Umlauts in the string need to be displayed correctly. The problem I face is that when I use UTF-8 as encoding the umlauts are NOT handled correctly. But when I use UTF-7

return Encoding.UTF7.GetString(encodedDataAsBytes);

everything works fine.

Why's that? Should'nt UTF-8 be able to handle umlauts??

  • 5
    utf8 _is_ very much able to handle umlauts. but if your data _is utf7_, then treating it as utf8 _will not work_. just like you get broken data when you treat a utf8-string as utf7. (don't forget: encoding is *not* something inherent in the string you are reading. it's just a defined way to _interpret_ binary data as text.) – Franz Gleichmann Sep 18 '20 at 07:36
  • 2
    What you get from decoding from Base64 is a byte representation of an _encoded_ string. If that string has been encoded in Windows-1252 then this might be because utf-7 and windows1252 are somewhat compatible in that they encode the same characters the same (Didn't check it, but if that works, it seems so.), while UTF-8 certainly does not. – Fildor Sep 18 '20 at 07:39
  • 1
    Please provide a few examples (base 64 encoded string, expected result) so we can reproduce it. – Codo Sep 18 '20 at 09:15

2 Answers2

1

Your vcard is UTF-7 encoded.

This is why Encoding.UTF7.GetString(encodedDataAsBytes); gives you the right result.

After it is encoded, you can't decide on another encoding.

To use UTF-8 encoding you would need access to the string before variable vcard got its value.

AndrewR
  • 162
  • 1
  • 8
-1

I had a similar problem. In my case, I used javaScript btoa() to encode a filename to Base64 within the Web UI, and send it over to the server. On the server side .net core, I used the code below to decode it back to a string filename.

// Note: encodedFilename is the result of btoa() from the client web UI.
var raw = Convert.FromBase64String(encodedFilename);
var filename = Encoding.UTF8.GetString(raw);

It failed to decode ä. However it worked when I used Encoding.UTF7(), but I think it is not the right solution. I believe that this due to the different encode/decode type. btoa() is binary to ASCII. What I really need is b64EncodeUnicode().

function b64EncodeUnicode(str) {
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
        return String.fromCharCode('0x' + p1);
    }));
}

Code Reference: https://developer.mozilla.org/en-US/docs/Glossary/Base64