Update special characters from column in TSQL

Question

It's been a while since we migrated from MySql to SQL Server 2017.

We have a table that contains email html content, the problem we are facing is the characters like ' are stored into SQL DB like this â€˜, I tried to change the collation to rectify this but it did not worked.

Current collation is SQL_Latin1_General_CP1_CI_AS.

I have tried many things like update the table column collation/charset to utf8, but nothing seems to work. I don't know how many special characters in my column table which contains lots of email templates.

Strange Characters in database text: Ã, Ã, ¢, â‚ €,

https://learn.microsoft.com/en-us/answers/questions/84241/strange-query-result-from-sql-server

Convert text value in SQL Server from UTF8 to ISO 8859-1

So far I have found these special characters â€˜, â€™, â€”, â€œ ,â€, Â. As per my understanding there can be few solutions:

Find each special characters manually and update each special character with SQL statement.
There can be a regex to pull all the special characters and replace them with correct ones, but that to pulls all other html characters.
If possible update the collation/charset of table column, that may possibly fix these characters but updating the collation/charset does not fix the issue in my case.

UPDATE 1: The data type to column is nvarchar(-1)

I am open to any solutions that can solve my problem, any help would be appreciated.

Is this data still continuing to appear in your database, or is this all historical? if the former, this would suggest that there's an encoding problem in your application too; I'd *guess* somewhere it's being implicitly converted from UTF-8 to code page 1252 (`â€˜` saved in ANSI 1252 and then opened in UTF-8 results in `‘`). — Thom A, Mar 01 '23 at 12:23
Also, in regards to changing the table to UTF-8; you can't (not in 2017). UTF-8 collations weren't added until SQL Server 2019. If you need non ANSI characters (or characters outside of the codepage), then you'll need to use an `nvarchar` which uses UCS-2 or UTF-16 depending on if the collation has Supplementary Characters. — Thom A, Mar 01 '23 at 12:25
@DanGuzman I have updated the question and added the column data type which is `nvarchar`. — Prawal, Mar 01 '23 at 12:34
Aside... UTF-8 is an encoding for the entire range of Unicode code points. There's no UTF-8 to iso-8859-1 conversion that isn't "lossy," i.e.: you'll lose any code points that don't exist in iso-8859-1. — AlwaysLearning, Mar 01 '23 at 12:36
@Larnu The data in table already exist(historical) which causes issue and when entering new data works fine. — Prawal, Mar 01 '23 at 12:36

score 0 · Answer 1 · answered Mar 01 '23 at 14:09

We recently did the same thing a few years ago, migrating from MySQL to SQL Server 2017, and we left the collation as SQL_Latin1_General_CP1_CI_AS. We've found we don't get special/error characters if we pass strings with ASCII/Unicode characters.

I don't remember if UTF-8 also works, but you essentially need to make sure your data inputs are good otherwise you're going to get those characters. We have to double-check our information coming from an international supplier.

You can find more information about Collation here: SQL Server 2017 Collation

Update special characters from column in TSQL

1 Answers1