4

In my PHP script, I made a connection with the MS SQL database server with the following code,

$connectionInfo = array( "Database"=>$database,"UID"=>$uid, "PWD"=>$pwd);
$conn = sqlsrv_connect( $serverName, $connectionInfo);

Mistakenly I did forget to specify the "CharacterSet"=>"UTF-8" in the $connectionInfo. Due to this reason, some Spanish and other characters have been encoded wrong. For example, "álgebra" is stored as "álgebra". Now that I have set the proper character set during the connection to the database, new data is stored correctly. But how can I restore the original strings that had been encoded wrongly and stored already?

Samiul Alam
  • 2,174
  • 4
  • 11
  • 29
  • 3
    *"But how can I restore the original strings that had been encoded wrongly and stored already?"* You can't, the data has already been lost. You would need to reinsert the original data or `UPDATE` the existing rows while referring to the original source. – Thom A Sep 08 '20 at 12:16
  • Isn't there any algorithm that can restore the original strings? – Samiul Alam Sep 08 '20 at 12:18
  • 2
    No, there is no algorithm or magic which can restore the original strings. – Jay Blanchard Sep 08 '20 at 12:27
  • 1
    Data loss from incorrect code pages is in many way identical to truncation, @SamiulAlam . Unless you have a copy of the *original* information somewhere then the data that has been lost would remain that way. – Thom A Sep 08 '20 at 12:38

2 Answers2

1

Most likely you haven't lost anything. Just convert the string/column to binary and then convert from binary to string with the proper encoding.

--2019
select cast(0xC3A16C6765627261 as varchar(100));

declare @t table(thechar varchar(100) collate Latin1_General_100_CI_AI_SC_UTF8)
insert into @t (thechar) values (0xC3A16C6765627261);

select *
from @t;
lptr
  • 1
  • 2
  • 6
  • 16
  • 1
    This collation gives this error: `Invalid collation 'Latin1_General_100_CI_AI_SC_UTF8'.` – Samiul Alam Sep 08 '20 at 13:26
  • 1
    take a look at https://stackoverflow.com/questions/28168055/convert-text-value-in-sql-server-from-utf8-to-iso-8859-1 ... equivalent to Xabi's answer – lptr Sep 08 '20 at 13:52
  • @SamiulAlam That should be because UTF8 collations are supported from SQL Server 2019 onwards – Joril Apr 12 '21 at 12:16
0

Sometimes I use this tiny function to convert from UTF8 that may help you:

create function FromUtf8(@src varchar(8000)) returns varchar(8000) as
begin
    declare @c char, @i int
    select @i = patIndex('%[ÂÃ][€-¿]%', @src collate Latin1_General_BIN)
    while @i > 0
        select  @c = char(((ascii(substring(@src, @i, 1)) & 31) * 64)
                         + (ascii(substring(@src, @i + 1, 1)) & 63)),
                @src = stuff(@src, @i, 2, @c),
                @i = patIndex('%[ÂÃ][€-¿]%', @src collate Latin1_General_BIN)
    return @src
end
Xabi
  • 465
  • 5
  • 8