-1

I have been given a T-SQL task: to convert/format names which are in ALL CAPS into Title Case. I have decided that splitting the names into tokens, and capitalizing the first letter out of each token, would be a reasonable approach (I am willing to take advice if there's a better option, especially in T-SQL).

That said, to accomplish this, I'd have to split the name fields on spaces AND dashes, hyphens, etc. Then, once it is tokenized, I can worry about normalizing the case.

Is there any reasonable way to split a string along any delimiter in a list?

nomen
  • 3,626
  • 2
  • 23
  • 40
  • 1
    Take a look at the below thread: https://stackoverflow.com/questions/230138/sql-server-make-all-upper-case-to-proper-case-title-case – Shawn Mar 26 '21 at 21:57
  • 1
    Names? You have no idea the can of worms you're opening. This is one of those things that sounds simple until you actually start to dig into the literally thousands of edge cases you absolutely MUST account for. – Joel Coehoorn Mar 27 '21 at 01:19
  • Suggested reading: [Falsehoods Programmers Believe About Names](https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/). – HABO Mar 27 '21 at 02:49
  • @JoelCoehoorn: I didn't open a can of worms. The tester did. – nomen Mar 27 '21 at 16:25

1 Answers1

0

If ease & performance is important then grab a copy of PatExtract8k.

Here's a basic example where I split on any character that is not a letter or number ([^a-z0-9]):

-- Sample String
DECLARE @string VARCHAR(8000) = 'abc.123&xyz!4445556__5566^rrr';

-- Basic Use
SELECT pe.* FROM   samd.patExtract8K(@string,'[^a-z0-9]') AS pe;

Output:

itemNumber      itemIndex   itemLength  item
--------------- ----------- ----------- -------------
1               1           3           abc
2               5           3           123
3               9           3           xyz
4               13          7           4445556
5               22          4           5566
6               27          3           rrr

It returns what you need as well as:

  1. the length of the item (ItemLength)
  2. It's position in the string (ItemIndex)
  3. It's ordinal position in the string (ItemNumber.)

Now against a table. Here we're doing the same thing but I'll explicitly call out the characters I want to use as a delimiter. Here it's any of these characters: *.&,?%/>

-- Sample Table
DECLARE @table TABLE (SomeId INT IDENTITY, SomeString VARCHAR(100));
INSERT @table VALUES('abc***332211,,XXX'),('abc.123&&555%jjj'),('ll/111>ff?12345');

SELECT      t.*, pe.*
FROM        @table                                      AS t
CROSS APPLY samd.patExtract8K(t.SomeString,'[*.&,?%/>]') AS pe;

This returns:

SomeId      SomeString          itemNumber   itemIndex  itemLength  item
----------- ------------------- ------------ ---------- ----------- ---------
1           abc***332211,,XXX   1            1          3           abc
1           abc***332211,,XXX   2            7          6           332211
1           abc***332211,,XXX   3            15         3           XXX
2           abc.123&&555%jjj    1            1          3           abc
2           abc.123&&555%jjj    2            5          3           123
2           abc.123&&555%jjj    3            10         3           555
2           abc.123&&555%jjj    4            14         3           jjj
3           ll/111>ff?12345     1            1          2           ll
3           ll/111>ff?12345     2            4          3           111
3           ll/111>ff?12345     3            8          2           ff
3           ll/111>ff?12345     4            11         5           12345

On the other hand - If I wanted to extract the delimiters I could change the pattern like this: [^*.&,?%/>]. Now the same query returns:

SomeId      itemNumber           itemIndex            itemLength  item
----------- -------------------- -------------------- ----------- ---------
1           1                    4                    3           ***
1           2                    13                   2           ,,
2           1                    4                    1           .
2           2                    8                    2           &&
2           3                    13                   1           %
3           1                    3                    1           /
3           2                    7                    1           >
3           3                    10                   1           ?
Alan Burstein
  • 7,770
  • 1
  • 15
  • 18