-4

I have some data like below in string my table in SQL server when I try to find using like it's not working. I want replace that unidentified character to empty

I need Mu��_��__oz to Muoz

Example:

->Mu��_��__oz
->S__��_��__��_��___��_��__��_��______ndergaard
->Gaas��_��__��_��___��_��__��_��____��_��__��_��___��_��__��_��____��_��__��_��___��_��__��_��____��_��__��_��___��_��__��_��____��_��__��_��___��_��__��_��_____��_��__��_��___��_��__��_��____�
tripleee
  • 175,061
  • 34
  • 275
  • 318
RakeshV
  • 3
  • 5
  • 2
    We have no idea how you produced this data or which bytes are actually in there. The black balls are basically an indication that you tried to paste as text something which isn't really. Could you please [edit] your question to provide the problematic records in some unambiguous format (hex always works, though there may be a good canonical format for your particular SQL product?) and also check out the [Stack Overflow `character-encoding` tag info page.](/tags/character-encoding/info) – tripleee Nov 13 '19 at 06:51
  • The proper fix is probably to change to Muñoz anyway...? – tripleee Nov 13 '19 at 06:52
  • @tripleee the above sample data added by the user, I don't know which character they insert – RakeshV Nov 13 '19 at 07:21
  • The problem here is we also cannot know which characters you are getting out. – tripleee Nov 13 '19 at 07:29
  • 1
    Being able to guess that these names are Muñoz and Søndergaard should probably get you to a known encoding once you see what the actual bytes are. If you don't know what we are talking about, now is the time when you should read http://kunststube.net/encoding/ – tripleee Nov 13 '19 at 07:31
  • How to strip all non-alphabetic characters from string in SQL Server? - the above question different from my question - before sending a negative reply to refer properly - i want to remove unidentify character not special characters – RakeshV Nov 14 '19 at 06:24
  • Then why did you accept an answer which is pretty much identical to the accepted answer on that question? Note that identical *answers* is a good reason to close as duplicate, even if the questions are different. – tripleee Nov 14 '19 at 06:24
  • There isn't really a proper way to identify "invalid" characters. What you have are valid characters, just not the ones you would like to have there (and as repeatedly pointed out, the *real* solution is to figure out the correct characters, not just remove the junk). – tripleee Nov 14 '19 at 06:26

1 Answers1

2

You can refer this Question and this excellent Answer by Even Mien

CREATE FUNCTION [dbo].[fn_StripCharacters]
(
    @String NVARCHAR(MAX), 
    @MatchExpression VARCHAR(255)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
    SET @MatchExpression =  '%['+@MatchExpression+']%'

    WHILE PatIndex(@MatchExpression, @String) > 0
        SET @String = Stuff(@String, PatIndex(@MatchExpression, @String), 1, '')

    RETURN @String

END

I have tested this function with your values and it works great. Check Demo here.

SELECT dbo.fn_StripCharacters('Mu��_��__oz','^a-z0-9') -->Muoz
Arun Palanisamy
  • 5,281
  • 6
  • 28
  • 53
  • That's what the OP is asking, but removing stuff you don't understand is almost never the correct solution. – tripleee Nov 13 '19 at 07:38
  • 1
    With the caveat that **you do not want to do this**. People's names have non-ASCII characters in them. That's a fact of life in the global world, and something you need to be able to handle in your application. It isn't that hard. Unicode has been around for many years now. – Cody Gray - on strike Nov 13 '19 at 07:39