5

I am working with an employee hierarchy string that is in the format of the following. These number represent employeeID numbers and how the are structured within the company, thus being able to follow the chain of management.

123|456|789|012|345|320

I am trying to take this string of data and turn it into a temp table so I can work with each of the ID's as their own value.

I tried making a function to split the string:

ALTER FUNCTION [dbo].[SplitString]
    (@String NVARCHAR(4000),
     @Delimiter NCHAR(1))
RETURNS TABLE
AS
    RETURN
        (WITH Split(stpos, endpos) AS
         (
             SELECT 0 AS stpos, CHARINDEX(@Delimiter, @String) AS endpos
             UNION ALL
             SELECT endpos + 1, CHARINDEX(@Delimiter, @String, endpos+1)
             FROM Split
             WHERE endpos > 0
         )
         SELECT 
             'Id' = ROW_NUMBER() OVER (ORDER BY (SELECT 1)),
             'Data' = SUBSTRING(@String, stpos, COALESCE(NULLIF(endpos, 0), LEN(@String) + 1))
         FROM 
             Split
)

This however resulted in the following:

Id  Data 
-------------------
1   123
2   456|7893
3   7893|012|345|
4   012|345|320
5   345|320
6   320

Is there a better way to approach this, maybe not needing a function at all or will it be required to achieve this?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
SBB
  • 8,560
  • 30
  • 108
  • 223
  • Possible duplicate of [Turning a Comma Separated string into individual rows](http://stackoverflow.com/questions/5493510/turning-a-comma-separated-string-into-individual-rows) – Juan Carlos Oropeza Dec 20 '16 at 21:02
  • 1
    There are better ways to approach a string split function, yes. [Here](https://sqlperformance.com/2012/07/t-sql-queries/split-strings) is a good reference. But as for your issue, it can be fixed by changing one part: `COALESCE(NULLIF(endpos,0) - stpos, LEN(@String)+1)` – ZLK Dec 20 '16 at 21:13

3 Answers3

6

Without a Parse Function

Declare @YourTable table (ID int,IDList varchar(Max))
Insert Into @YourTable values
(1,'123|456|789|012|345|320'),
(2,'123|456')

Select A.ID
      ,B.*
 From @YourTable A
 Cross Apply (
                Select RetSeq = Row_Number() over (Order By (Select null))
                      ,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
                From (Select x = Cast('<x>'+ replace((Select A.IDList as [*] For XML Path('')),'|','</x><x>')+'</x>' as xml).query('.')) as A  
                Cross Apply x.nodes('x') AS B(i)
             ) B

Returns

ID  RetSeq  RetVal
1   1       123
1   2       456
1   3       789
1   4       012
1   5       345
1   6       320
2   1       123
2   2       456

OR with the SUPER DUPER Parse (orig source listed below / couple of tweaks)

Select A.ID
      ,B.*
 From @YourTable A
 Cross Apply [dbo].[udf-Str-Parse-8K](A.IDList,'|') B

Would Return the same as above

CREATE FUNCTION [dbo].[udf-Str-Parse-8K] (@String varchar(max),@Delimiter varchar(10))
Returns Table 
As
Return (  
    with   cte1(N)   As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
           cte2(N)   As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 a,cte1 b,cte1 c,cte1 d) A ),
           cte3(N)   As (Select 1 Union All Select t.N+DataLength(@Delimiter) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter)) = @Delimiter),
           cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter,@String,s.N),0)-S.N,8000) From cte3 S)

    Select RetSeq = Row_Number() over (Order By A.N)
          ,RetVal = Substring(@String, A.N, A.L) 
    From   cte4 A
);
--Orginal Source http://www.sqlservercentral.com/articles/Tally+Table/72993/
--Much faster than str-Parse, but limited to 8K
--Select * from [dbo].[udf-Str-Parse-8K]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse-8K]('John||Cappelletti||was||here','||')

Edit - Stand Alone

Declare @String varchar(max) = '123|456|789|012|345|320'
Declare @Delim  varchar(10)  = '|'

Select RetSeq = Row_Number() over (Order By (Select null))
      ,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
From (Select x = Cast('<x>'+ replace((Select @String as [*] For XML Path('')),@Delim,'</x><x>')+'</x>' as xml).query('.')) as A 
Cross Apply x.nodes('x') AS B(i)
John Cappelletti
  • 79,615
  • 7
  • 44
  • 66
  • I like your first version, without the function. I am trying to implement it now. Since I am already working with the source table this data is coming from, is it possible to pass a string to this instead of having it pull from a table? For example, get rid of @YourTable and have the crossApply use a string variable instead? – SBB Dec 20 '16 at 21:23
  • @SBB Sure, see EDIT - Stand Alone – John Cappelletti Dec 20 '16 at 21:26
  • @SBB Seriously, If you are in the market for a parse function, take a hard look at the "Super Duper" option. Can't take credit for it, but the performance increase was dramatic. – John Cappelletti Dec 20 '16 at 21:35
  • Hi John, the comment I placed below M.Alis answer would be fine for you too. You might want to read it... – Shnugo Dec 20 '16 at 22:34
  • @Shnugo Sorry, I neglected to thank you for this tip. – John Cappelletti Dec 22 '16 at 11:50
2

If you need a string "splitter" the fastest one available for 2012 (pre- 2016) is going to be found here. This will blow the doors off of anything posted thusfar. If your items/tokens are all the same size then an even faster method would be this:

DECLARE @yourstring varchar(8000) = '123|456|789|012|345|320';

WITH E(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(v)),
iTally(N) AS (SELECT TOP ((LEN(@yourstring)/4)+1) ROW_NUMBER() OVER (ORDER BY (SELECT 1)) 
              FROM e a, e b, e c, e d)
SELECT itemNumber = ROW_NUMBER() OVER (ORDER BY N), item = SUBSTRING(@yourstring, ((N*4)-3), 3) 
FROM iTally;

Results:

itemNumber           item
-------------------- ----
1                    123
2                    456
3                    789
4                    012
5                    345
6                    320

I write more about this and provide examples of how to put this logic into a function here.

Alan Burstein
  • 7,770
  • 1
  • 15
  • 18
1

I use this version of the Split function.

CREATE FUNCTION [dbo].[Split]
(
  @delimited nvarchar(max),
  @delimiter nvarchar(100)
) RETURNS @t TABLE
(
-- Id column can be commented out, not required for sql splitting string
  id int identity(1,1), -- I use this column for numbering splitted parts
  val nvarchar(max)
)
AS
BEGIN
  declare @xml xml
  set @xml = N'<root><r>' + replace(@delimited,@delimiter,'</r><r>') + '</r></root>'

  insert into @t(val)
  select
    r.value('.','varchar(max)') as item
  from @xml.nodes('//root/r') as records(r)

  RETURN
END

You Query would look something like....

SELECT * 
FROM TableName t 
  CROSS APPLY [dbo].[Split](t.EmpIDs, '|')
M.Ali
  • 67,945
  • 13
  • 101
  • 127
  • You might think about the following enhancement: `replace((SELECT @delimited AS [*] FOR XML PATH('')),@delimiter,'')`. Doing so, your function would be able to split strings containing forbidden characters. Otherwise your function would break with `SELECT * FROM dbo.Split('this,is,,for,< & >',',')` – Shnugo Dec 20 '16 at 22:32
  • Btw: with this enhancement all forbidden characters are turned into their entities (e.g. `&lg;`). That means, that - if you delimiter was the semicolon or the ampersand - you could get into troubles again. One way is to replace the delimiter first with something else, do the encoding trick and use the replacement value for the split... – Shnugo Dec 20 '16 at 22:40