2

We have huge list of attributes in proper case and we would like to get output with proper space after each word

For example -

As-Is: ServiceProviderReferenceNumber

To-Be: Service Provider Reference Number

Any ideas on how I can convert this in T-SQL ? Any help would be appreciated

Note: We are using SQL Server 2016

user2827587
  • 231
  • 4
  • 15
  • Take a look at this previous answer http://stackoverflow.com/questions/23470794/split-words-with-a-capital-letter-in-sql – Rich Benner Dec 19 '16 at 08:36
  • 5
    Some abbreviation like `NASA` will become `N A S A` you may need to consider. – Eric Dec 19 '16 at 09:20

5 Answers5

4

Try this It Worked

Create Function dbo.Split_On_Upper_Case(@Temp VarChar(100))
Returns VarChar(100)
AS
Begin

Declare @KeepValues as varchar(50)
Set @KeepValues = '%[^ ][A-Z]%'
While PatIndex(@KeepValues collate Latin1_General_Bin, @Temp) > 0
    Set @Temp = Stuff(@Temp, PatIndex(@KeepValues collate Latin1_General_Bin, @Temp) + 1, 0, ' ')

Return @Temp
End
sai bharath
  • 792
  • 2
  • 10
  • 22
  • 2
    I just added your approach to my test scenario (see my answer) and found, that yours is even faster than mine (although it is `USF`...). I guess, that using `STUFF` is faster internally as string methods. Anyway worth an upvote! – Shnugo Dec 19 '16 at 10:14
3

This is an approach without a loop...

Besides the fact, that loops are something to avoid, this makes the handling of a full table really easy.

First I create a mock-up table with three strings

DECLARE @tbl TABLE(ID INT IDENTITY,YourString VARCHAR(100));
INSERT INTO @tbl VALUES('ServiceProviderReferenceNumber'),('SomeOther'),('AndOneMore');

The query will first create a numbers tally table on the fly, then split the string in single chars, check for capitalized letters and add a blank.

Finally this is reconcatenated and trimmed

Attention If you might need more than 1000 characters, just add one more ,Numbers AS c to the Tally (providing 10.000 numbers then)

WITH Numbers AS
 (SELECT Nr FROM(SELECT Nr FROM(VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS x(Nr)) AS y)
,Tally AS
(
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nr
    FROM Numbers,Numbers AS a,Numbers AS b
)
,Splitted AS
(
    SELECT ID 
          ,CASE WHEN ASCII(Chars.OneChar) BETWEEN ASCII('A') AND ASCII('Z') THEN ' ' + Chars.OneChar ELSE Chars.OneChar END AS TheChar
    FROM @tbl AS t
    CROSS APPLY (SELECT TOP(LEN(t.YourString)) Tally.Nr FROM Tally) AS Nmbr
    CROSS APPLY (SELECT SUBSTRING(t.YourString,Nmbr.Nr,1) AS OneChar) AS Chars
)
SELECT ID
      ,LTRIM((
        SELECT s.TheChar AS [*]
        FROM Splitted AS s
        WHERE s.ID=Splitted.ID
        FOR XML PATH('')
       )) AS ReConcatenated
FROM Splitted
GROUP BY ID

The result

1   Service Provider Reference Number
2   Some Other
3   And One More

UPDATE Comparison of ad-hoc vs. loop

CREATE DATABASE testDB;
GO

USE testDB;
GO

CREATE TABLE tbl(ID INT IDENTITY,YourString VARCHAR(100));
GO

CREATE FUNCTION dbo.TestF(@String VARCHAR(4000))
RETURNS VARCHAR(4000)
AS
BEGIN
    DECLARE @StringNew nvarchar(100) = '';
    DECLARE @Char nvarchar(1);
    DECLARE @len int = LEN(@String);
    DECLARE @i int = 0;

    WHILE @i <= @len
      BEGIN
        SET @i = @i+1;
        SET @Char = substring(@String,@i,1);

        IF (UNICODE(@Char) = UNICODE(UPPER(@Char)) AND @i > 1)
          SET @StringNew = @StringNew + ' ' + @Char;
        ELSE
          SET @StringNew = @StringNew + @Char;
      END;

      RETURN @StringNew
    END
GO

INSERT INTO tbl VALUES('ServiceProviderReferenceNumber'),('SomeOther'),('AndOneMore');
GO 100000

DECLARE @d DATETIME=GETDATE();

WITH Numbers AS
 (SELECT Nr FROM(SELECT Nr FROM(VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS x(Nr)) AS y)
,Tally AS
(
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nr
    FROM Numbers,Numbers AS a,Numbers AS b
)
,Splitted AS
(
    SELECT ID 
          ,CASE WHEN ASCII(Chars.OneChar) BETWEEN ASCII('A') AND ASCII('Z') THEN ' ' + Chars.OneChar ELSE Chars.OneChar END AS TheChar
    FROM tbl AS t
    CROSS APPLY (SELECT TOP(LEN(t.YourString)) Tally.Nr FROM Tally) AS Nmbr
    CROSS APPLY (SELECT SUBSTRING(t.YourString,Nmbr.Nr,1) AS OneChar) AS Chars
)
SELECT ID
      ,LTRIM((
        SELECT s.TheChar AS [*]
        FROM Splitted AS s
        WHERE s.ID=Splitted.ID
        FOR XML PATH('')
       )) AS ReConcatenated
FROM Splitted
GROUP BY ID;

SELECT CAST(GETDATE()-@d AS TIME);
GO

DECLARE @d DATETIME=GETDATE();

SELECT ID,dbo.TestF(tbl.YourString) AS ReConcatenated
FROM tbl

SELECT CAST(GETDATE()-@d AS TIME);
GO
USE master;
GO
DROP DATABASE testDB;

The result

**ad-hoc** 2.66 Seconds
**loop**   5.33

UPDATE 2: This answer by @sai bharath is faster...

The result

**ad-hoc**              2.66 Seconds
**loop**                5.33
**while with `STUFF`**  1.71
Community
  • 1
  • 1
Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • Looks OK, even if I think this wouldn't be much faster / more stable than my approach... furthermore: you approach might work on SQL Server, but won't on APS / PDW... just to mention. ;-) – Tyron78 Dec 19 '16 at 09:37
  • 1
    @Tyron78 I just made the comparison. Wrapped your code in an `USF` and use both approaches on 300.000 entries. Mine took 2.66 seconds, yours was 5.33... I'll edit my answer and provide my test scenario. – Shnugo Dec 19 '16 at 09:49
  • Oh, OK. Great. Thanks for the info - I'll try it next time I face this kind of task (and outside of the APS / PDW). – Tyron78 Dec 19 '16 at 10:09
  • 1
    @Tyron78, look at my `update 2`. There is an USF-approach even faster than my *ad-hoc* sql... – Shnugo Dec 19 '16 at 10:17
  • can you please test my solution in your environment and post performances results? – MtwStark Dec 19 '16 at 11:22
  • @MtwStark You could've done this on your own, the code is there for *copy'n'paste* ... Your *superfast recursive CTE* is - by far! - the slowest of all. A *recursice CTE* is an *iterative CTE* actually. Internally this is handled as *RBAR* and is not better than any loop, in this case it's even worse, because this leads to a *nested loop* – Shnugo Dec 19 '16 at 11:35
  • please read carefully.. I wrote "please test ..in your environment.." because I already tested it.. in my environment, and my recursive cte resulted 10 times faster than XML PATH – MtwStark Dec 19 '16 at 11:39
  • @MtwStark I re-tested it with 300.000 rows and must tell you, that your approach is above 11 seconds, while the other approaches bring roughly the same results as before... – Shnugo Dec 19 '16 at 11:59
  • I think that performances depends on sql server version and hardware. I based my comment on execution plan, recursive CTE is really 10 times better than XML PATH, even if time to run is higher. – MtwStark Dec 19 '16 at 13:52
  • @MtwStark Sorry do not get this... What does *even if time to run is higher* mean? If we speak about performance this is the one and only goal, isn't it? – Shnugo Dec 19 '16 at 13:54
  • I mean that on different hardwares and different sql versions (express vs standard or enterprise) the same query will perform better or worse than others, but execution plan cost should be the same – MtwStark Dec 19 '16 at 13:57
  • @MtwStark Of course, the hardware and version will have some influencem but I'd assume, that the actual run time will rather scale linearly... – Shnugo Dec 19 '16 at 14:12
  • @MtwStark One more point: All solutions will need a full table scan. The big deal with *inline* or *procedural* is on combining resultsets, indexes, statistics... All of this is not used here. This is - at the final end - just a *RBAR-string-manipulation* – Shnugo Dec 19 '16 at 14:15
1

Here an approach, which loops through the string and compares the Uncicode Values of characters. I chose to compare the numbers instead of the characters, because depending on the DB configuration 'a'='A' evaluates as true.

DECLARE @String nvarchar(100) = 'ServiceProviderReferenceNumber';
DECLARE @StringNew nvarchar(100) = '';
DECLARE @Char nvarchar(1);
DECLARE @len int = LEN(@String);
DECLARE @i int = 0;

WHILE @i <= @len
  BEGIN
    SET @i = @i+1;
    SET @Char = substring(@String,@i,1);

    IF (UNICODE(@Char) = UNICODE(UPPER(@Char)) AND @i > 1)
      SET @StringNew = @StringNew + ' ' + @Char;
    ELSE
      SET @StringNew = @StringNew + @Char;
  END;

  PRINT @StringNew
Shnugo
  • 66,100
  • 9
  • 53
  • 114
Tyron78
  • 4,117
  • 2
  • 17
  • 32
  • In most cases one should avoid loops... This approach will need a `CURSOR` if you have to deal with many table rows, which is even worse... – Shnugo Dec 19 '16 at 09:13
  • @Eric A *loop approach* in connection with a **multi-statement!!** function combines two things one really should avoid... – Shnugo Dec 19 '16 at 09:27
  • I added the check on first position - so now there shouldn't be an additional space at the beginning anymore. However, same can be achieved by adding a trim to the end. Yes, you can put it into a USF - should work quite OK. Another possibility might be a REPLACE of all uppercase characters with Space + uppercase character... but this would be a quite nasty approach. – Tyron78 Dec 19 '16 at 09:30
0

try this superfast recursive CTE

;with
w as (
    select 
        id, 
        cast(YourColumn collate LATIN1_GENERAL_BIN as varchar(500)) ss 
    from YourTable w
),
r as (
    select id, ss s, 2 l, PATINDEX('%[A-Z]%', ss) p, 1 i
    from w

    union all
    select w.id, cast(stuff(s, p, 0, ' ') as varchar(500)), p + 1 l, pos + p + 1 p, pos i
    from w
    join (
        select id, s, PATINDEX('%[A-Z]%', substring(s, p+1, 500) ) pos, p, i
        from r
    ) rr on w.id = rr.id and i>0
)
select id, s
from r
where i=0
OPTION (MAXRECURSION 500); 
MtwStark
  • 3,866
  • 1
  • 18
  • 32
0
    ;WITH a(s) AS 
    (
     SELECT 'ServiceProviderReferenceNumber' UNION 
     SELECT 'ConvertProperCaseToSpaceAfterProperCase' 
    )
    SELECT LTRIM(b.ss) FROM a
    CROSS APPLY (
        SELECT CASE WHEN ASCII(SUBSTRING(a.s,sv.number,1))  BETWEEN 65 AND 90 THEN ' '+SUBSTRING(a.s,sv.number,1) ELSE SUBSTRING(a.s,sv.number,1) END 
        FROM master.dbo.spt_values AS sv 
        WHERE sv.type='P' AND sv.number BETWEEN 1 AND LEN(a.s)
        FOR XML PATH('')

    ) b(ss)
Convert Proper Case To Space After Proper Case
Service Provider Reference Number
Nolan Shang
  • 2,312
  • 1
  • 14
  • 10