0

If I recive some data from a website I get the following string:

Peter Tester   â‚‹   Max Mustermann

The meta infos of the homepage shows me that the encoding is UTF-8, I wrote a little function to convert UTF-8 to Base64 (Base64 is the correct charset for default C# projects or?)

private String UTF8toBase64(string input)
{
    var bytes = Encoding.UTF8.GetBytes(input);
    return Convert.ToBase64String(bytes);
}

But this function returns a string like this:

"S3lsZSBFZG11bmQgJm5ic3A7IMOi4oCa4oC5ICZuYnNwOyZuYnNwO0ppcmkgVmVzZWx5"
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Philipp Nies
  • 945
  • 4
  • 20
  • 38
  • 2
    Base64 is for encoding unrepresentable data, default encoding is Default which is different on each system, it depends on the machine locale. – Gusman Mar 22 '16 at 13:06
  • If you have a string then it means you already parsed your input stream as UTF-16. Probably original text can't be recovered (if you didn't get any encoding exception...) and you can't recover it. You have to apply _conversion_ on input byte stream. BTW Base64 has nothing to do with UTF8... – Adriano Repetti Mar 22 '16 at 13:06
  • 2
    When you receive data from the web site, you're receiving *bytes*. I'd start from there. If those bytes are meant to be UTF-8-encoded text, use `Encoding.UTF8` to decode it. Base64 is unrelated here. – Jon Skeet Mar 22 '16 at 13:06
  • 1
    I recive the web site and "parse" it with the HTMLAgilityPack. After parsing the page I want to filter some values. – Philipp Nies Mar 22 '16 at 13:09

2 Answers2

0

I believe you just want:

return Encoding.UTF8.GetString(bytes);
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Henrik
  • 2,180
  • 16
  • 29
0

I found a thread with the same problem in VB

HTML encoding issues - “” character showing up instead of “ ”

The same function in C# works nice, after replacing the "â‚‹" with   Ive got a string I can work with :-)

Thanks for help people

Regex.Replace(input, "[^\u0000-\u007F]", " ")
Community
  • 1
  • 1
Philipp Nies
  • 945
  • 4
  • 20
  • 38