As Tim Biegeleisen mentioned, this isn't easy in SQL Server, as it doesn't support regular expressions. As such you have to be some what inventive.
As we don't know what version of SQL Server you are using (though I did ask) I am assuming you are using the latest version of SQL Server, and have access to both STRING_AGG
and TRIM
. If not, you'll need to use the old FOR XML PATH
and STUFF
method for string aggregation, and LTRIM
and RTRIM
with nested REPLACE
s for TRIM
.
Anyway, what I do here is collate the value to a binary collation that is both case sensitive and also orders the letters in Uppercase and then Lowercase (though a collation that does Lowercase and then Uppercase would be fine too, it's just important it's not alphabetically and then case). So in an order like ABC...Zabc...z
rather than like AaBb...Zz
. I then use a Tally to split the collated string into it's individual characters.
I then use STRING_AGG
with a CASE
expression to only retain the Underscore characters (which you appear to want as well) and just the uppercase letters. Finally I use TRIM
to remove any leading and trailing underscores; without this the value returned would be 'ABC_DEF_GHI_'
.
I also assume you are doing this against a table, rather than a scalar value, which gives this:
DECLARE @SomeString varchar(100) = 'ABC_DEF_GHI_jkl';
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT TOP (SELECT MAX(LEN(V.SomeString)) FROM (VALUES(@SomeString))V(SomeString)) --This would be your table
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2) --100 rows, add more cross joins for more rows
SELECT TRIM('_' FROM STRING_AGG(CASE WHEN SS.C LIKE '[A-Z_]' THEN SS.C END,'') WITHIN GROUP (ORDER BY T.I)) AS NewString
FROM (VALUES(@SomeString))V(SomeString) --This would be your table
CROSS APPLY (VALUES(V.SomeString COLLATE Latin1_General_BIN))C(SomeString) --Collate to a collation that is both case sensitive and orders Uppercase first
JOIN Tally T ON LEN(C.SomeString) >= T.I
CROSS APPLY (VALUES(SUBSTRING(C.SomeString,T.I,1)))SS(C) --Get each character
GROUP BY V.SomeString;
db<>fiddle
Of course, a "simpler" solution might be to find and implement a Regex CLR function and just use that.
Turns out the OP is using 2014... This means the above needs some significant refactorying. I am afraid I don't explain how FOR XML PATH
or REPLACE
work here (as I put the effort into the original solution), however, a search will yield you the details.:
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT TOP (SELECT MAX(LEN(V.SomeString)) FROM (VALUES(@SomeString))V(SomeString)) --This would be your table
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2) --100 rows, add more cross joins for more rows
SELECT REPLACE(LTRIM(RTRIM(REPLACE((SELECT CASE WHEN SS.C LIKE '[A-Z_]' THEN SS.C END
FROM (VALUES(V.SomeString COLLATE Latin1_General_BIN))C(SomeString) --Collate to a collation that is both case sensitive and orders Uppercase first
JOIN Tally T ON LEN(C.SomeString) >= T.I
CROSS APPLY (VALUES(SUBSTRING(C.SomeString,T.I,1)))SS(C) --Get each character
ORDER BY T.I
FOR XML PATH(''),TYPE).value('(./text())[1]','varchar(100)'),'_',' '))),' ','_') AS NewString
FROM (VALUES(@SomeString))V(SomeString) --This would be your table
GROUP BY V.SomeString;