1

I have some special unicode characters, like bullets and squares in a table. When I do a select, I would like to remove the special characters but still return the rest of the text.

Is there a way to do this?

KyleMit
  • 30,350
  • 66
  • 462
  • 664
user441365
  • 3,934
  • 11
  • 43
  • 62
  • 1
    By "ignore", do you mean to not return them in the results or to construct a WHERE clause that ignores them? – Mark Wilkins Jun 09 '11 at 16:58
  • I want to return the text but removing the characters – user441365 Jun 09 '11 at 17:01
  • 2
    @user441365: If know the exact/specific characters, you could use the `replace` scalar function maybe. – Mark Wilkins Jun 09 '11 at 17:05
  • @Mark Wilkins yes but I'm looking for something more generic – user441365 Jun 09 '11 at 17:06
  • 1
    Perhaps you can modify Mastros function from this question: http://stackoverflow.com/questions/1007697/how-to-strip-all-non-alphabetic-characters-from-string-in-sql-server – ypercubeᵀᴹ Jun 09 '11 at 23:40
  • @ypercube - yeah, it's the better approach –  Jun 10 '11 at 12:19
  • Possible duplicate of [How to strip all non-alphabetic characters from string in SQL Server?](https://stackoverflow.com/questions/1007697/how-to-strip-all-non-alphabetic-characters-from-string-in-sql-server) – KyleMit Aug 14 '18 at 22:08

3 Answers3

4

Try to use this function. The following code returns the text composed only from the range of Unicode char codes; in this case from 0x0020 (space char) to 0x007E (tilde char) while the rest is omitted. You can define one or more character ranges to extend the valid characters for this function. The list of Unicode characters including their codes can be found e.g. here.

CREATE FUNCTION dbo.CharRangeOnly (@InputText NVARCHAR(4000))
  RETURNS NVARCHAR(4000)
AS
BEGIN
  DECLARE @Index INT
  DECLARE @BufferChar INT
  DECLARE @OutputText NVARCHAR(4000)

  SET @Index = 1
  SET @OutputText = ''

  WHILE @Index < LEN(@InputText) + 1
    BEGIN
      SET @BufferChar = UNICODE(SUBSTRING(@InputText, @Index, 1))

      -- here you can specify the valid character range(s)
      IF @BufferChar BETWEEN 0x0020 AND 0x007E
        SET @OutputText = @OutputText + NCHAR(@BufferChar)

      SET @Index = @Index + 1
    END

  RETURN @OutputText
END

You can use it this way

SELECT dbo.CharRangeOnly (N'•■乕ขTeᶕst໙ ཙof  β俼 ޠ➽Unᦗic៘ode✸ᾦ')

The function called with this crazy mix of chars surprisingly returns

'Test of Unicode'
  • @daemon_x how do you make a call to this function? – user441365 Jun 09 '11 at 17:17
  • @Andriy - I've mentioned the range is from Unicode's char codes 32 (0x0020 - space) to 126 (0x007E - tilde), I know I could choose a better name; not `BasicLatinOnly` it's little bit confusing. But you can simply modify the range according what you need. E.g. if you would like to get only the whole Arabic script then you set the range from 1536 (0x600 - punctuation sign) to 1791 (0x6FF - letter Heh with inverted V) so you'll change the line `IF @BufferChar BETWEEN 32 and 126` to `IF @BufferChar BETWEEN 1536 and 1791` –  Jun 09 '11 at 21:52
  • @Andriy - or you can choose more than one group, e.g. my own basic Latin set `:)` and the whole Arabic script would look like this `IF (@BufferChar BETWEEN 32 AND 126) OR (@BufferChar BETWEEN 1536 AND 1791)` –  Jun 09 '11 at 21:59
  • Sorry, I can see now that I missed that part in your answer where you were already explaining all this. Thanks for taking the time for explaining it to me again. – Andriy M Jun 09 '11 at 22:12
  • @Andriy - you're welcome; it was quite misleading so I've modified it a bit. thanks for the hint ;) –  Jun 09 '11 at 22:25
0

This may be of some help. You can do it with SQL, or build a CLR method and import it into your instance of SQL server to make the determination. Either way you'll need some kind of scalar function comparing character by character.

link

CodeMonkey1313
  • 15,717
  • 17
  • 76
  • 109
-1

According to this (Archived) Microsoft Support Issue:

You must precede all Unicode strings with a prefix N when you deal with Unicode string constants in SQL Server

KyleMit
  • 30,350
  • 66
  • 462
  • 664
MadushM
  • 397
  • 3
  • 17
  • Looks like OP wants to strip out special characters, so using Unicode strings isn't the limiting factor – KyleMit Aug 14 '18 at 22:07