I am not a professional developer, and am having a problem converting Unicode text
to ANSI
found in a legacy application that doesn't support Unicode
.
Here's a sample of what a Unicode
-encoded text looks like when displayed in that legacy application:
À chaque journée des quatre jours de colloque, entre 250 et 500 personnes sont venues assister en continu aux discussions de cette rencontre. Cette affluence, ainsi que la richesse et la variété des discussions engagées lors de ces conférences, confirment la nécessité d'un espace ouvert pour les pensées critiques dans le monde francophone, à l'université et bien au-delà .
I notice the following things:
- All diacritic characters are encoded as C3 ("Ã") + a second byte
- The character "à" is wrongly encoded as C320 ("Ã ")
- Windows' CharacterMap application says that "é" is "U+00E9" while the document contains C3A9 instead.
I have a couple of questions:
Why the difference between the document and
CharacterMap
: Is the document encoded in something else thanUnicode
? For instance, why isé
encoded asC3A9
instead of00E9
?I use the following VB.Net code to convert the document from
Unicode
toAnsi
: How can I replace all occurrences ofC320
withà
?Dim Encw1252 As Encoding = Encoding.GetEncoding("windows-1252") Dim EncUTF8 As Encoding = Encoding.GetEncoding("utf-8") Dim Str As String Str = Encw1252.GetString(Encoding.Convert(EncUTF8, Encw1252, encoding.Default.GetBytes(Clipboard.GetText))) Clipboard.SetText(Str)