15

in my database I have this char �. I want to locate them with a query

Select * 
from Sometable 
where somecolumn like '%�%'

this gets me no result.

I think it is ANSI encoding

Simon
  • 159
  • 5
  • 1
    is the column varchar/char or nvarchar/nchar? – gsharp Apr 18 '17 at 14:15
  • What version of SQL Server are you using? – Radu Gheorghiu Apr 18 '17 at 14:15
  • 15
    There is more than a single character that will be represented with �, all that doesn't have a representation on your code page. Although you see the same graphical representation for all of them, they still are different characters and their comparison will return false. – Marc Guillot Apr 18 '17 at 14:44
  • the column is nvarchar – Simon Apr 18 '17 at 14:57
  • 3
    "ANSI encoding" is an [incredibly meaningless statement](https://en.wikipedia.org/wiki/ANSI_character_set). Go read [Joel's blog on character sets](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/), then use UTF-8 without a byte order mark as much as possible. – jpmc26 Apr 18 '17 at 20:11
  • You might simply have data that is stored as (or converted to) some kind of 8-bit encoding (usually one of the ISO-8859-* or something similar) which is then sent to a client that expects UTF-8. In this cas, it'll take the character as the first by of an UTF-8 multi-byte sequence, then fail on the rest, and show this characters instead. Make sure you have consistent encoding all over (UTF-8 recommended), and that you don't convert from one encoding to another where not appropriate (take special care of avoiding double-conversion). – jcaron Apr 18 '17 at 22:45
  • After you find � in your database, find what put it there and fix it. It's most likely data loss due to mishandling text by reading text with a character encoding other than what it was written with or by writing it with an encoding for a character set that doesn't include all the characters in the text. (Though, we have certainly just put a lot of valid uses of � in SO's database.) – Tom Blodget Apr 19 '17 at 01:11

3 Answers3

14

use N like below

 where col like N'%�%'

why do you think ,you need N prefix:

Prefix Unicode character string constants with the letter N. Without the N prefix, the string is converted to the default code page of the database. This default code page may not recognize certain characters.

Thanks to Martin Smith,Earlier i tested only with one character earlier and it worked,but as Martin pointed out, it returns all characters..

Below query works and returns only intended

select * from #demo where id  like N'%�%' 
COLLATE Latin1_General_100_BIN

Demo:

create table #demo
(
id nvarchar(max)
)

insert into #demo
values
(N'ﬗ'),
( N'�')

to know more about unicode,please see below links

http://kunststube.net/encoding/

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

Community
  • 1
  • 1
TheGameiswar
  • 27,855
  • 8
  • 56
  • 94
  • 1
    Even with the edit to add the `COLLATE` option, this answer is still incorrect as it now only matches that one single character, yet there are others that are not the same Code Point yet will display with the character in question. Try the following query to see: `SELECT NCHAR(0xD808), CASE WHEN NCHAR(0xD808) LIKE N'%�%' COLLATE Latin1_General_100_BIN2 THEN 'YES' ELSE 'NO' END;`. – Solomon Rutzky Apr 19 '17 at 05:55
6

This is the Unicode replacement character symbol.

It could match any of 2,048 invalid code points in the UCS-2 encoding (or the single character U+FFFD for the symbol itself).

You can use a range and a binary collate clause to match them all (demo).

WITH T(N)
AS 
(
SELECT TOP 65536 NCHAR(ROW_NUMBER() OVER (ORDER BY @@SPID))
FROM master..spt_values v1, 
     master..spt_values v2
)
SELECT N 
FROM T
WHERE N LIKE '%[' +  NCHAR(65533) + NCHAR(55296) + '-' + NCHAR(57343) + ']%' COLLATE Latin1_General_100_BIN
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Only thing I would change here is that those 2,048 code points aren't specifically invalid, they are just required to be used in certain combinations, and do not individually represent anything. – Solomon Rutzky Apr 19 '17 at 06:04
  • @srutzky - My understanding was that UCS-2 doesn't support surrogate pairs. – Martin Smith Apr 19 '17 at 06:46
  • Martin: Yes, UCS-2 doesn't have mappings for surrogate pairs, but all the high and low surrogate code points are reserved for that purpose (implemented in UTF-16). Fortunately, SQL Server is UTF-16 (Little Endian), just like .NET and Windows. You can see this by executing: `SELECT NCHAR(0xD83C), NCHAR(0xDF78), NCHAR(0xD83C) + NCHAR(0xDF78), NCHAR(0x01F378);`. Individually the code points are meaningless, but in the right combination they will display correctly. And if you are in a DB with a default collation ending with `_SC`, then that 4th field will show the character instead of `NULL` . – Solomon Rutzky Apr 19 '17 at 15:45
5

You can use ASCII to find out the ascii code for that char

Select ascii('�')

And use CHAR to retrieve the char from that code and combine it in a LIKE expression

Select * from Sometable
where somecolumn like '%'+CHAR(63)+'%'

Note the collation you use can affect the result. Also it depends on the encoding used by your application to feed your data (UTF-8, UNICODE, etc). also how you store it VARCHAR, or NVARCHAR has a last say on what you see.

There's more here in this similar question

EDIT @Mark

try this simple test:

create table sometable(somecolumn nvarchar(100) not null)
GO

insert into sometable
values
 ('12345')
,('123�45')
,('12345')
GO

select * from sometable
where somecolumn like '%'+CHAR(63)+'%'
GO

This only means that character was stored win the as a "?" in this test.

When you see a � it means the app where you are seeing isn't quite sure what to print out.

It also mean OP probably needs to find out what char is that using a query. Also note it means a string outputted like ��� can be 3 formed by different characters.

CHAR(63) was just an example, but you are right this in the ASCII table will be a standard interrogation.

EDIT @Bridge

Not with time right now to deep dig in it but the below test don't worked

Select ascii('�'), CHAR(ascii('�')), UNICODE(N'�'), CHAR(UNICODE(N'�'))
GO

create table sometable(somecolumn nvarchar(100) not null)
GO

insert into sometable
values
 ('12345')
,('123�45')
,('12345')
,('12'+NCHAR(UNICODE(N'�'))+'345')
GO

select * from sometable
where somecolumn like '%'+CHAR(63)+'%'

select * from sometable
where somecolumn like '%'+NCHAR(UNICODE(N'�'))+'%'

GO
Community
  • 1
  • 1
jean
  • 4,159
  • 4
  • 31
  • 52
  • `char(63)` is a `?`, not a `�`. – Mark Rotteveel Apr 18 '17 at 14:35
  • 4
    This character is outside the ASCII range - you'll need to use `UNICODE` instead. See the results of running this: `Select ascii('�'), CHAR(ascii('�')), UNICODE(N'�'), NCHAR(UNICODE(N'�'))` – Bridge Apr 18 '17 at 15:34