3

Possible Duplicate:
Convert Latin 1 encoded UTF8 to Unicode

I want to convert latin1 (ISO-8859-1) to UTF8 in C#. What is the best way to do this?

My string is "Công ty TNHH TM và DL Việt Hương".

Community
  • 1
  • 1
hainv
  • 97
  • 2
  • 2
  • 9
  • Thanks, but it not resolved!! – hainv Dec 22 '12 at 03:27
  • 2
    That is not a valid ISO-8859-1 string. There are no Vietnamese characters in Latin-1. Here's [the Latin-1 codepage](http://en.wikipedia.org/wiki/ISO/IEC_8859-1#Codepage_layout); if you don't see your character there, it's not available for conversion. – Michael Petrotta Dec 22 '12 at 03:37
  • Closing question as a duplicate of a question that is complete nonsense (what on earth is "Latin 1 encoded UTF8"?!) seems wrong to me. While this question is flawed, as noted by @MichaelPetrotta notes above, I'm voting to reopen it; it can't possibly be a duplicate of a question about converting "latin 1 encoded UTF8" to unicode because that's *completely meaningless*. – Mark Amery May 22 '17 at 10:16
  • While the original closure may or may not have been accurate, the question is still off-topic because it is asking a primarily opinion-based question. It is typically not worth it to reopen an off-topic question, even if you intended to *re*-close it using a better method. Voting to leave closed. – TylerH May 06 '18 at 14:59

1 Answers1

3

convert latin1 (ISO-8859-1) to UTF8 in C#:

Encoding.UTF8.GetString(Encoding.GetEncoding("iso-8859-1").getBytes(s))

OR

In C-Sharp use System.Text:

byte[] utf8Bytes = Encoding.UTF8.GetBytes("ASCII to UTF8");
byte[] isoBytes = Encoding.Convert(Encoding.ASCII, Encoding.UTF8, utf8Bytes);
string uf8converted = Encoding.UTF8.GetString(isoBytes);

Source:

Convert Latin 1 encoded UTF8 to Unicode

C# Convert string from UTF-8 to ISO-8859-1 (Latin1) H

Community
  • 1
  • 1
Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
  • My string is "Công ty TNHH TM và DL Việt Hương" when using http://www.unicodetools.com/unicode/utf8-to-latin-converter.php result is correct but using Encoding.UTF8.GetString(Encoding.GetEncoding("iso-8859-1").getBytes(s)) is incorrect. – hainv Dec 22 '12 at 03:28
  • The website converter for latin1 to utf8 must use an algorithm that can handle non latin1 characters when converting from latin1 to UTF8. So it appears C# is less tolerant when fed non-latin characters, wheras the website is able to make an educated guess about the invalid characters. The question becomes, which algorithm is the website using and in which language is it written in? – Eric Leschinski Dec 22 '12 at 03:46
  • Yeah, that site doesn't do a great job with the OP's string either. Not that I'd expect it to - without knowing the source codepage, it comes down to guesswork. I certainly wouldn't call it "correct". – Michael Petrotta Dec 22 '12 at 04:04
  • The website doesn't convert anything, your string was ok to begin with, with just html needing to be unescaped. Because the site is in html, they show correctly without any conversion even. – Esailija Dec 22 '12 at 07:20