SQL: Update Unicode Data in column to Accented characters

Question

I have a column which was set to Varchar and the database set to SQL_Latin1_General_CP1_CI_AS.

When a user entered their name into our web front end and save the data, it was not saving accented characters correctly.

The web user was entering the following, "Béala" but this was being saved on the database as the following, "BÃ©ala".

I believe that changing the column from Varchar to NVarchar should prevent this from happening going forward(?), however, I have two questions.

1) How do I perform a select on the existing data in the column and display it correctly?

select CONVERT(NVARCHAR(100),strAddress1) from [dbo].[tblCustomer]

This still shows the data incorrectly.

2) How do I update the data in the column once converted to NVarchar to save the accented characters correctly?

Many thanks, Ray.

@a_horse_with_no_name - thanks for the change suggestion to the tag. — raymantle, Jan 11 '19 at 12:30
Your application has saved mojibake -- UTF-8 encoded data stored as a Windows-1252 encoded string. Prior to SQL Server 2019, the engine had no support for UTF-8 at all, so fixing this in pure T-SQL is essentially not possible. In .NET, you'd fix this with `Encoding.UTF8.GetString(Encoding.GetEncoding("Windows-1252").GetBytes("BÃ©ala"))`. Changing the storage type to `NVARCHAR` *may* make the client app work correctly, but it's actually just as possible that you get the same, incorrectly encoded result (or a different corruption). You may need to fix how data gets sent to the DB. — Jeroen Mostert, Jan 11 '19 at 12:31
OK, so I was wrong about "not possible" -- it's just [very cumbersome](https://stackoverflow.com/a/28412587/4137916), but someone has done the legwork. This is just for fixing data stored incorrectly, though -- it should not be used to work around the problem on the client side, which needs its own fix. — Jeroen Mostert, Jan 11 '19 at 12:52

score 0 · Answer 1 · answered Jan 11 '19 at 13:02

The only idea that came to my mind is that you have to prepare an update that will fool this badly loaded data, i.e. a sign 'Ã©' will always match exactly one character (in this case 'é'), you have to catch all special characters and this as have been changed (just a simple update with cases and replace). Of course, the first column must be of the nvarchar type. It solves the problem 1 and 2 (the data will be correct in the table, the data will be displayed correctly, I described the update above)

Muhammad Waheed · Answer 2 · 2019-01-11T12:39:27.517

-1

Here is way to get it in normal characters scheme.

select 'Réunion', cast('Réunion' as varchar(100)) COLLATE SQL_Latin1_General_CP1253_CI_AI

Moreover to check all possible collations in SQL Server you can try this query


SELECT name, description
  FROM sys.fn_helpcollations();

edited Jan 11 '19 at 12:39

answered Jan 11 '19 at 12:33

Muhammad Waheed

1,048
1
13
30

1

The question is not "how do we strip accents from Latin characters". `Réunion` can be stored without loss in SQL_Latin1_General_CP1_CI_AS. Even if we wanted to strip the accents, this will still fail to do anything about the existing data -- it will just corrupt it even further to `BA©ala`. The column collation is not the problem here. – Jeroen Mostert Jan 11 '19 at 12:41
That did not work. `select 'BÃ©ala', cast('BÃ©ala' as varchar(100)) COLLATE SQL_Latin1_General_CP1253_CI_AI` Got the following BA©ala – raymantle Jan 11 '19 at 12:48

SQL: Update Unicode Data in column to Accented characters

2 Answers2