Get the Capital letters from a string

Question

I have this requirement to extract the capital letters from a column in SQL Server.

EX: ABC_DEF_ghi

I only want to extract ABC_DEF.

Sometimes the string could be like ABC_DEF_GHI_jkl, so in this case it will be ABC_DEF_GHI

Any suggestions would be helpful.

Thanks in advance.

This would be a cinch using regular expressions. But SQL Server doesn't really have any regex support. I would export the table, do the cleanup in a script, and then re-import. — Tim Biegeleisen, Jun 30 '21 at 09:35
Those are not just the capital letters, you also appear to require the underscores. — BCdotWEB, Jun 30 '21 at 09:48

Thom A · Answer 1 · 2021-06-30T11:13:22.067

As Tim Biegeleisen mentioned, this isn't easy in SQL Server, as it doesn't support regular expressions. As such you have to be some what inventive.

As we don't know what version of SQL Server you are using (though I did ask) I am assuming you are using the latest version of SQL Server, and have access to both STRING_AGG and TRIM. If not, you'll need to use the old FOR XML PATH and STUFF method for string aggregation, and LTRIM and RTRIM with nested REPLACEs for TRIM.

Anyway, what I do here is collate the value to a binary collation that is both case sensitive and also orders the letters in Uppercase and then Lowercase (though a collation that does Lowercase and then Uppercase would be fine too, it's just important it's not alphabetically and then case). So in an order like ABC...Zabc...z rather than like AaBb...Zz. I then use a Tally to split the collated string into it's individual characters.

I then use STRING_AGG with a CASE expression to only retain the Underscore characters (which you appear to want as well) and just the uppercase letters. Finally I use TRIM to remove any leading and trailing underscores; without this the value returned would be 'ABC_DEF_GHI_'.

I also assume you are doing this against a table, rather than a scalar value, which gives this:

DECLARE @SomeString varchar(100) = 'ABC_DEF_GHI_jkl';

WITH N AS(
    SELECT N
    FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
    SELECT TOP (SELECT MAX(LEN(V.SomeString)) FROM (VALUES(@SomeString))V(SomeString)) --This would be your table
           ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
    FROM N N1, N N2) --100 rows, add more cross joins for more rows
SELECT TRIM('_' FROM STRING_AGG(CASE WHEN SS.C LIKE '[A-Z_]' THEN SS.C END,'') WITHIN GROUP (ORDER BY T.I)) AS NewString
FROM (VALUES(@SomeString))V(SomeString) --This would be your table
     CROSS APPLY (VALUES(V.SomeString COLLATE Latin1_General_BIN))C(SomeString) --Collate to a collation that is both case sensitive and orders Uppercase first
     JOIN Tally T ON LEN(C.SomeString) >= T.I
     CROSS APPLY (VALUES(SUBSTRING(C.SomeString,T.I,1)))SS(C) --Get each character
GROUP BY V.SomeString;

db<>fiddle

Of course, a "simpler" solution might be to find and implement a Regex CLR function and just use that.

Turns out the OP is using 2014... This means the above needs some significant refactorying. I am afraid I don't explain how FOR XML PATH or REPLACE work here (as I put the effort into the original solution), however, a search will yield you the details.:

WITH N AS(
    SELECT N
    FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
    SELECT TOP (SELECT MAX(LEN(V.SomeString)) FROM (VALUES(@SomeString))V(SomeString)) --This would be your table
           ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
    FROM N N1, N N2) --100 rows, add more cross joins for more rows
SELECT REPLACE(LTRIM(RTRIM(REPLACE((SELECT CASE WHEN SS.C LIKE '[A-Z_]' THEN SS.C END
                                    FROM (VALUES(V.SomeString COLLATE Latin1_General_BIN))C(SomeString) --Collate to a collation that is both case sensitive and orders Uppercase first
                                         JOIN Tally T ON LEN(C.SomeString) >= T.I
                                         CROSS APPLY (VALUES(SUBSTRING(C.SomeString,T.I,1)))SS(C) --Get each character
                                    ORDER BY T.I
                                    FOR XML PATH(''),TYPE).value('(./text())[1]','varchar(100)'),'_',' '))),' ','_') AS NewString
FROM (VALUES(@SomeString))V(SomeString) --This would be your table
GROUP BY V.SomeString;

Then you'll need to use `FOR XML PATH` (and `STUFF`) and replace the `TRIM` functions as I direct about now, @udaykiran . — Thom A, Jun 30 '21 at 11:04
[string_agg for sql server pre 2017](https://stackoverflow.com/questions/49361088/string-agg-for-sql-server-pre-2017) Though the above will need a decent amount of refactoring too. — Thom A, Jun 30 '21 at 11:04
I have added a 2014 compatible version for you, @udaykiran , but please remember in future to tag the version you are using; especially if it's a version that isn't fully supported any more. Without versions, most users will assume you are at *least* using a fully supported version of a product, and may well assume you are using the latest version. — Thom A, Jun 30 '21 at 11:14

score 0 · Answer 2 · answered Jun 30 '21 at 15:23

For SQL 2017 and upper :

DECLARE @SomeString varchar(100) = 'ABC_DEF_GHI_jkl';
WITH T0 AS
(
SELECT 1 AS INDICE, 
       SUBSTRING(@SomeString, 1, 1) AS RAW_LETTER, 
       SUBSTRING(UPPER(@SomeString), 1, 1) AS UP_LETTER
UNION ALL
SELECT INDICE + 1, 
       SUBSTRING(@SomeString, INDICE + 1, 1) AS RAW_LETTER, 
       SUBSTRING(UPPER(@SomeString), INDICE + 1, 1)
FROM   T0
WHERE  INDICE < LEN(@SomeString)
)
SELECT STRING_AGG(RAW_LETTER, '') WITHIN GROUP (ORDER BY INDICE)
FROM   T0
WHERE  RAW_LETTER COLLATE Latin1_General_BIN = UP_LETTER;

For SQL Server previous than 2017 :

WITH T0 AS
(
SELECT 1 AS INDICE, 
       SUBSTRING(@SomeString, 1, 1) AS RAW_LETTER, 
       SUBSTRING(UPPER(@SomeString), 1, 1) AS UP_LETTER
UNION ALL
SELECT INDICE + 1, 
       SUBSTRING(@SomeString, INDICE + 1, 1) AS RAW_LETTER, 
       SUBSTRING(UPPER(@SomeString), INDICE + 1, 1)
FROM   T0
WHERE  INDICE < LEN(@SomeString)
)
SELECT STUFF((SELECT '' + RAW_LETTER
              FROM   T0
              WHERE  RAW_LETTER COLLATE Latin1_General_BIN = UP_LETTER
              ORDER BY INDICE
              FOR XML PATH('')), 1, 0, '');

Get the Capital letters from a string

2 Answers2