-1

I am facing to an issue that I cannot update or replace some characters in my database. Here is how this text look like in my column when I retrieve it:

enter image description here

As you can see, there is an unknown characters between 'master' and 'degree' which I cannot even paste it here.

I also tried to update and replace it with below code (I cannot paste that two vertical lines here since they are not supported in any browser and I am not sure what they are, Please see the picture above to see what is in my SQL statement).

begin transaction
update gm_desc set projdesc=replace(projdesc,'%â  s%','') where projdesc like '%âs%' and proposalno = '15-149-01'

You can see the real SQL Statement here:

enter image description here

I tried to update, or replace it but I cannot do it. The update statement successfully works but I still see that weird special charters. I would be appreciate to help me.

Dale K
  • 25,246
  • 15
  • 42
  • 71
lili
  • 83
  • 9
  • 2
    Is `projdesc` an `NVarChar`? Have you tried matching characters _not_ in a set, e.g. `like '%[^a-z0-9]%'` to find the rows? You can dump the column in hexadecimal to get the character values, e.g. `select projdesc, Cast( projdesc as VarBinary(1000) ) as Hex from gm_desc where projectdesc like '%[^a-z0-9 ]%';`. – HABO Apr 17 '20 at 01:11
  • To support HABO's comment: I'm pretty sure, that this is some kind of 2-byte encoded *apostrophe character* representing *"master's degree"*. First of all you should find out your string's encoding. The cast to `VARBINARY(MAX)` - as suggested by @HABO - will help you to see the internals. – Shnugo Apr 17 '20 at 15:32

1 Answers1

3

Here's a scalar-valued function which removes all non-alphanumeric characters (preserves spaces) from a string.

Hopefully it helps!

dbfiddle

create function dbo.get_alphanumeric_str
(
    @string varchar(max)
)
returns varchar(max)
as
begin

    declare @ret varchar(max);

    with nums as (
        select 1 as n
        union all select n+1 from nums
        where n < 256
    )
    select @ret = replace(stuff(
        (
            select '' + substring(@string, nums.n, 1)
            from nums
            where patindex('%[^0-9A-Za-z ]%', substring(@string, nums.n,1)) = 0 
            for xml path('')
        ), 1, 0, ''
    ), '&#x20;', ' ')
    option (MAXRECURSION 256)

    return @ret;
end

Usage

select dbo.get_alphanumeric_str('Helloᶄ âWorld 1234⅊⅐')

Returns: Hello World 1234

How it works

  • The nums CTE is just to get a list of numbers (you can set the maximum of 256 to a higher value if your strings are longer; n.b. option (MAXRECURSION n) is for this CTE but has to be placed at the query)
  • The stuff essentially iterates through the string, using the list of numbers above and extracts a substring of length 1; each of these chars are checked if they match the [^0-9A-Za-z ] regex group (0-9 all digits, A-Za-z all letters both lower and upper case, and a single space character)
  • If they match, patindex() should return 0; i.e. index zero.
  • Use replace(string, '&#x20;', ' ') for the space character as the xml path returns a special encoding, see this question.
  • Use a binary collation for accented characters; see this answer
Jamie Phan
  • 1,112
  • 7
  • 15
  • 1
    In this case I'm pretty sure, that this won't help, at least not in a sure way. Assumably the given text stands for *"master's degree"*, where the *strange characters* are the 2-byte encoded apostrophe. Your function accepts only 1-byte encoded text (`varchar`) and returns as such. If my assumption is true, you would not get the original text, but the text without the apostrophe. In other cases you might return additional characters (in cases, where one byte of the 2-byte-code-point translates to a plain character by chance). – Shnugo Apr 17 '20 at 15:37
  • Thank you so much I will check it today. I appreciate your valuable time – lili Apr 17 '20 at 21:18