6

How do I replace only the last character of the string:

select REPLACE('this is the news with a þ', 'þ', '__')

The result I'm getting is:

__is is __e news wi__ a __

EDIT The collation of the server and the database is Latin1_General_CI_AS

The actual query I'm running is REPLACE(note, 'þ', '') where note is an ntext column. The point is to strip out the thorn characters because that character gets used later in the process as a column delimiter. (Please don't suggest changing the delimiter, that's just not going to happen given the extent to which it's been used!)

I've tried using the N prefix even using the test select statement, here are the results:

Still broken!

Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
Sean
  • 14,359
  • 13
  • 74
  • 124
  • I have run this query on MS SQL and I got result as `this is the news with a __` – SelvaS Mar 12 '15 at 14:50
  • try select REPLACE(N'this is the news with a þ', 'þ', '__') -- it is not the answer, it is just for comments – Dmitrij Kultasev Mar 12 '15 at 14:51
  • 3
    1) You are not working with Unicode since you are not prefixing any of those strings with a capital `N`. 2) what is your database default Collation? 3) If you run the following, you will get the result you show in the question: `SELECT REPLACE(N'this is the news with a þ', N'þ', N'__');` – Solomon Rutzky Mar 12 '15 at 14:52
  • It depends on the collation of your current database - when I try it I get `this is the news with a __` – codeulike Mar 12 '15 at 14:55

2 Answers2

12

The þ character (Extended ASCII { via ISO-8859-1 and ANSI Code Page 1252 } & UNICODE value of 254) is known as "thorn" and in some languages equates directly to th:

  • Technical info on the character here: http://unicode-table.com/en/00FE/

  • Explanation of that character and collations here: http://userguide.icu-project.org/collation/customization. Search the page — typically Control-F — for "Complex Tailoring Examples" and you will see the following:

    The letter 'þ' (THORN) is normally treated by UCA/root collation as a separate letter that has primary-level sorting after 'z'. However, in Swedish and some other Scandinavian languages, 'þ' and 'Þ' should be treated as just a tertiary-level difference from the letters "th" and "TH" respectively.

If you do not want þ to equate to th, then force a Binary collation as follows:

SELECT REPLACE(N'this is the news with a þ' COLLATE Latin1_General_100_BIN2,
                 N'þ', N'__');

Returns:

this is the news with a __

For more info on working with Collations, Unicode, encodings, etc, please visit: Collations Info

Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
  • ASCII doesn't have a codepoint 254 nor the þ character. If you go `SELECT name, collation_name, COLLATIONPROPERTY(collation_name, N'CodePage') FROM master.sys.databases`, I'd be surprise if you see code page 20127 (or 367) listed. – Tom Blodget Mar 12 '15 at 17:16
  • @TomBlodget Well, yes, _strictly_ speaking, ASCII proper is just 0 - 127. But in practical terms, this character is 254 in [8859-1 Latin1](http://www.ascii.ca/iso8859.1.htm) / [ANSI Code Page 1252](http://www.ascii.ca/cp1252.htm). – Solomon Rutzky Mar 12 '15 at 17:39
  • @TomBlodget For some reason I stumbled upon this again after 5 years, and in re-reading it I think I see the point you were making. I think before I was feeling that "ASCII" alone was ok because that's how so many people talk about it, but now I feel that I shouldn't help perpetuate that misuse / oversimplification. So, I updated to clarify that I was meaning Extended ASCII and not standard ASCII. – Solomon Rutzky Sep 14 '20 at 20:38
0

This might work for you:

DECLARE @text NVARCHAR(1000) = N'this is the news with a þ';
DECLARE @find NVARCHAR(1000) = N'þ';
DECLARE @replace NVARCHAR(1000) = N'_';

SELECT REPLACE(CAST(@text AS VARCHAR), CAST(@find AS VARCHAR), CAST(@replace AS VARCHAR));
Dmitrij Kultasev
  • 5,447
  • 5
  • 44
  • 88
  • 3
    FYI, you should never use variable-length datatypes without specifying the length. There are two different defaults --1 or 30 -- depending on where it is being used. – Solomon Rutzky Mar 12 '15 at 14:59