0

I have an address with French chars - Vétéran. In table column it is read as Vétéran when using SMS select. The server language is English.

When I copied Vétéran to Word, it remained the same. I saved the Word doc as plain text using Windows default encoding (Western Europe), it is changed to Vétéran.

I cannot find a way to display it correctly in SMS. The data is copied from Oracle 12. It is displayed as Vétéran by Oracle SQL Developer.

Need some help.

Thanks,

Will

I tried different case and accent setting in collate. Does not work.

JNevill
  • 46,980
  • 4
  • 38
  • 63
  • What is the column type? – GSerg Jan 27 '23 at 18:54
  • It is VARCHAR(MAX). It is the text of an XML file. I got it from a BLOB in Oracle. – William Li Jan 27 '23 at 19:08
  • So it already contains broken text. – GSerg Jan 27 '23 at 19:15
  • I cannot say it is broken. Some software can display correctly. Word can change the junk chars to French. There is something I do not know. – William Li Jan 27 '23 at 19:23
  • 3
    They are not junk chars, they are [mojibake](https://devblogs.microsoft.com/oldnewthing/20140930-00/?p=43953). Most likely UTF-8 interpreted as Latin1. – GSerg Jan 27 '23 at 19:27
  • 1
    Try calling [this function](https://stackoverflow.com/a/56181559/11683) on each such field, as in `select dbo.Convert_utf8(cast('Vétéran' as varbinary(8000)))`. – GSerg Jan 27 '23 at 19:31
  • A somewhat less mystical way to decode it with XML would be something like the following: `select cast('' + 'Vétéran' as xml).value('text()[1]', 'varchar(50)');`. Probably the `encoding="utf-8"` DTD attribute wasn't included in the original XML file from Oracle, or some intermediate process stripped it out. – AlwaysLearning Jan 27 '23 at 23:37
  • I got error when casting some rows to XML. There could be way to convert varbinary to utf8 directly without using XML. – William Li Jan 28 '23 at 03:57
  • Well, yes. Your fields contain binary utf-8 data that is being interpreted as ASCII. So you need to cast it to varbinary and process with a tool that can interpret the bytes as utf-8. You can make a dump, process with an external tool and import back. Or you could have a CLR function that accepts a varbinary and calls `Encoding.UTF8.GetString` on it. – GSerg Jan 28 '23 at 13:34
  • I created a python program to do the fix. Not all lines are encoded wrong. Have to deal with those also. def convertToUTF8(origText:str): try: UTF8Text=origText.encode("windows-1252").decode("utf-8") return UTF8Text except: return origText – William Li Jan 29 '23 at 14:38

0 Answers0