Need a .NET cast / convert of the SQL nchar to char.
More specifically cast of the nchar UNICODE to the char ASCII.
Where this is complicated is SQL char uses the full byte.
Not the pure ASCII of 128.
The TSQL function ASCII returns 0-255.
Ideally there would be a NormalizationForm of FormByte.
It would not be an exact textual value - rather a close logical value or ?.
And SQL would use the FormByte to cast from nchar to char.
NormalizationForm
Encode Decode did not work for me and I tried all flavors.
In SQL many char (byte) are mapped to 63.
63 is ?.
Not just char over 255 that are mapped to 63.
130 to 140 are all mapped to 63.
Characters 160-255 all return 160-255
Over 255 not all are mapped to 63.
For example many diacritics are mapped to the ASCII.
TSQL has UNICODE and ACSII functions.
So I just loaded all Unicode characters into both char and nchar columns.
The char returned by SQL is wrong for 29 characters.
And the ASCII() returned for the bad chars does not make sense - all control chars in the 130 - 160 range.
Checked the binary for the incorrect 29 and what is is stored is what is returned by ASCII().
For 27 what is being return from char is nchar and for 2 it is not even the correct nchar. They should all be mapped to ? or a an ACSII equivalent.
“ and ” mapped to " (but will take ?)
‘ and ’ mapped to '
– (en dash) and — (em dash) mapped to -
...
I know you don't believe me.
Insert 'Œ' into a char column and select it - it will return 'Œ'.
And you can even search on it - char = 'Œ' returns true.
Select ASCII('Œ') returns 140 and that is what is actually stored (check the binary).
The UNICODE definition of 140 / 8C is Partial Line Backward.
I checked the binary value for that char and it is 8C (140).
What is returned is the unicode 'Œ' Int16 338.
It appears SQL is doing some input output mapping and getting it wrong.
ASCII function is correct for 575 unicode characters that are not mapped to ?.
The char value matches the ACSII and they all make sense.
E.G. 12 different forms of u are all mapped to u.
32163 characters other than ? are mapped to ? (63).
Below are the 29 char that return the wrong value.
Column order:
char
nchar
ASCII(char)
UNICODE(nchar)
sqlCharASCIIbackToString did not match Œ Œ 140 338
sqlCharASCIIbackToString did not match œ œ 156 339
sqlCharASCIIbackToString did not match Š Š 138 352
sqlCharASCIIbackToString did not match š š 154 353
sqlCharASCIIbackToString did not match Ÿ Ÿ 159 376
sqlCharASCIIbackToString did not match Ž Ž 142 381
sqlCharASCIIbackToString did not match ž ž 158 382
sqlCharASCIIbackToString did not match ƒ Ƒ 131 401
sqlCharASCIIbackToString did not match ƒ ƒ 131 402
sqlCharASCIIbackToString did not match ˆ ˆ 136 710
sqlCharASCIIbackToString did not match ˜ ˜ 152 732
sqlCharASCIIbackToString did not match – – 150 8211
sqlCharASCIIbackToString did not match — — 151 8212
sqlCharASCIIbackToString did not match ‘ ‘ 145 8216
sqlCharASCIIbackToString did not match ’ ’ 146 8217
sqlCharASCIIbackToString did not match ‚ ‚ 130 8218
sqlCharASCIIbackToString did not match “ “ 147 8220
sqlCharASCIIbackToString did not match ” ” 148 8221
sqlCharASCIIbackToString did not match „ „ 132 8222
sqlCharASCIIbackToString did not match † † 134 8224
sqlCharASCIIbackToString did not match ‡ ‡ 135 8225
sqlCharASCIIbackToString did not match • • 149 8226
sqlCharASCIIbackToString did not match
… … 133 8230
sqlCharASCIIbackToString did not match ‰ ‰ 137 8240
sqlCharASCIIbackToString did not match ‹ ‹ 139 8249
sqlCharASCIIbackToString did not match › › 155 8250
sqlCharASCIIbackToString did not match € € 128 8364
sqlCharASCIIbackToString did not match ™ ™ 153 8482
sqlCharASCIIbackToString did not match ˜ ≈ 152 8776
count63 = 32163 countMis = 29 countCorrect = 575
Ran the following .NET to see which 'Œ' if being returned by SQL
char char338 = (char)338;
System.Diagnostics.Debug.WriteLine(char338);
sqlCmd.CommandText = "select [char] from [charNchar] where [char] = @char;";
sqlCmd.Parameters.Add("@char", SqlDbType.Char).Value = char338;
string string338= sqlCmd.ExecuteScalar().ToString();
char338 = string338.ToCharArray()[0];
System.Diagnostics.Debug.WriteLine(char338 + " " + ((Int16)char338).ToString());
The above code returns Œ 338.
SQL is returning a value larger than byte to datatype is supposedly stored as byte.
If I search on (char)140 then ? 63 is returned.
What is interesting is a search on 'Œ' versus N'Œ' on char produce different results.
That is searching on the left (140) Œ.
Search on the right (338) Œ the char search finds nothing.
Nchar finds both results with either input.
SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
FROM [test].[dbo].[charNchar]
where [char] = 'Œ'
SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
FROM [test].[dbo].[charNchar]
where [char] = N'Œ'
SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
FROM [test].[dbo].[charNchar]
where [nchar] = 'Œ'
SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
FROM [test].[dbo].[charNchar]
where [nchar] = N'Œ'
int16 char ASCII nchar UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338 Œ 140 Œ 338
int16 char ASCII nchar UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338 Œ 140 Œ 338
339 œ 156 œ 339
int16 char ASCII nchar UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338 Œ 140 Œ 338
339 œ 156 œ 339
int16 char ASCII nchar UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338 Œ 140 Œ 338
339 œ 156 œ 339
≈ search finds nothing with any of the four queries. Checked the charts and that is the correct char for 8776 and is math almost equal to.
˜ is zero width pasted into SSMS but it is something as if it is pasted into the FROM the blue turn to black.
Am I missing something - this seems like a bug to me.
It is not just he wrong value it is an invalid value.
An Int16 is returned.
Lets say I wanted to use byte to store character to save space - it would break on SQL char as 29 chars are not returned as byte.
Here is the code I used:
public void SQLchar()
{
SqlConnection sqlCon = new SqlConnection(connString);
try
{
sqlCon.Open();
SqlCommand sqlCmd = sqlCon.CreateCommand();
SqlDataReader rdr;
sqlCmd.CommandText = "delete charNchar";
sqlCmd.ExecuteNonQuery();
for(Int16 i = 0; i < Int16.MaxValue; i ++)
{
sqlCmd.CommandText = "insert into charNchar (int16,char,nchar) values (@int16, @char, @nchar);";
sqlCmd.CommandType = System.Data.CommandType.Text;
sqlCmd.Parameters.Clear();
sqlCmd.Parameters.Add("@int16", SqlDbType.Int).Value = i;
sqlCmd.Parameters.Add("@char", SqlDbType.Char).Value = (char)i;
sqlCmd.Parameters.Add("@nchar", SqlDbType.NChar).Value = (char)i;
sqlCmd.ExecuteNonQuery();
}
string sqlChar;
string sqlNChar;
Int16 sqlCharASCII;
Int16 sqlNCharUnicode;
string sqlCharASCIIbackToString;
sqlCmd.CommandText = "select char,nchar,ASCII(char),UNICODE(nchar) from charNchar order by int16;";
rdr = sqlCmd.ExecuteReader();
Int16 count63 = 0;
Int16 countMis = 0;
Int16 countCorrect = 0;
while (rdr.Read())
{
sqlChar = rdr.IsDBNull(0) ? "dbNull" : rdr.GetString(0);
sqlNChar = rdr.IsDBNull(1) ? "dbNull" : rdr.GetString(1);
sqlCharASCII = rdr.IsDBNull(2) ? Int16.Parse("-1") : (Int16)rdr.GetInt32(2);
sqlNCharUnicode = rdr.IsDBNull(3) ? Int16.Parse("-1") : (Int16)rdr.GetInt32(3);
if(sqlCharASCII == 63 && sqlNCharUnicode != 63)
{
count63 ++;
continue; // ?
}
if (sqlCharASCII < 0)
{
System.Diagnostics.Debug.WriteLine("ASCII(char) null for " + sqlChar + " " + sqlNChar);
}
else
{
sqlCharASCIIbackToString = ((char)sqlCharASCII).ToString();
if (string.CompareOrdinal(sqlChar, sqlCharASCIIbackToString) != 0)
{
countMis++;
System.Diagnostics.Debug.WriteLine(" sqlCharASCIIbackToString did not match " + sqlCharASCIIbackToString + " " + sqlChar + " " + sqlNChar + " " + sqlCharASCII + " " + sqlNCharUnicode);
}
else
{
countCorrect++;
}
}
}
rdr.Close();
System.Diagnostics.Debug.WriteLine("count63 = " + count63.ToString() + " countMis = " + countMis.ToString() + " countCorrect = " + countCorrect.ToString());
}
catch (Exception Ex)
{
System.Diagnostics.Debug.WriteLine(Ex.Message);
}
finally
{
sqlCon.Close();
}
}
As for why.
Parse string data in .NET and that data is a FK.
Rather than round trip to SQL to get the ID for the FK use a .NET Dictionary for speed.
The Dictionary is a reverse lookup to to get the key from the value.
The parser has the Int16 of the char as that is already used by the parser.
So if the ASCII of the char is wrong then the reverse lookup fails.
I think I could hard code fixes for the incorrect ASCII results.
But I want to understand what is going on here before I go down a path that starts with patches.
Does char have some fundamental flaws?
Could just use nchar but we prefer char.
The nature of the application is we want matches.
6 diacritics of u to all match ascii u is a good thing.