1

It's been a while since we migrated from MySql to SQL Server 2017.

We have a table that contains email html content, the problem we are facing is the characters like ' are stored into SQL DB like this ‘, I tried to change the collation to rectify this but it did not worked.

Current collation is SQL_Latin1_General_CP1_CI_AS.

I have tried many things like update the table column collation/charset to utf8, but nothing seems to work. I don't know how many special characters in my column table which contains lots of email templates.

Strange Characters in database text: Ã, Ã, ¢, â‚ €,

https://learn.microsoft.com/en-us/answers/questions/84241/strange-query-result-from-sql-server

Convert text value in SQL Server from UTF8 to ISO 8859-1

So far I have found these special characters ‘, ’, —, “ ,â€, Â. As per my understanding there can be few solutions:

  1. Find each special characters manually and update each special character with SQL statement.

  2. There can be a regex to pull all the special characters and replace them with correct ones, but that to pulls all other html characters.

  3. If possible update the collation/charset of table column, that may possibly fix these characters but updating the collation/charset does not fix the issue in my case.

UPDATE 1: The data type to column is nvarchar(-1)

I am open to any solutions that can solve my problem, any help would be appreciated.

Prawal
  • 58
  • 6
  • Add the data type of the problem column to your question. – Dan Guzman Mar 01 '23 at 12:16
  • What is the replacement value(s) ? – Tim Biegeleisen Mar 01 '23 at 12:17
  • Is this data still continuing to appear in your database, or is this all historical? if the former, this would suggest that there's an encoding problem in your application too; I'd *guess* somewhere it's being implicitly converted from UTF-8 to code page 1252 (`‘` saved in ANSI 1252 and then opened in UTF-8 results in `‘`). – Thom A Mar 01 '23 at 12:23
  • 1
    Also, in regards to changing the table to UTF-8; you can't (not in 2017). UTF-8 collations weren't added until SQL Server 2019. If you need non ANSI characters (or characters outside of the codepage), then you'll need to use an `nvarchar` which uses UCS-2 or UTF-16 depending on if the collation has Supplementary Characters. – Thom A Mar 01 '23 at 12:25
  • @DanGuzman I have updated the question and added the column data type which is `nvarchar`. – Prawal Mar 01 '23 at 12:34
  • Aside... UTF-8 is an encoding for the entire range of Unicode code points. There's no UTF-8 to iso-8859-1 conversion that isn't "lossy," i.e.: you'll lose any code points that don't exist in iso-8859-1. – AlwaysLearning Mar 01 '23 at 12:36
  • @Larnu The data in table already exist(historical) which causes issue and when entering new data works fine. – Prawal Mar 01 '23 at 12:36

1 Answers1

0

We recently did the same thing a few years ago, migrating from MySQL to SQL Server 2017, and we left the collation as SQL_Latin1_General_CP1_CI_AS. We've found we don't get special/error characters if we pass strings with ASCII/Unicode characters.

I don't remember if UTF-8 also works, but you essentially need to make sure your data inputs are good otherwise you're going to get those characters. We have to double-check our information coming from an international supplier.

You can find more information about Collation here: SQL Server 2017 Collation