2

I have some data that includes a column with delimited data. There are multiple records in the same column essentially:

A0434168.A2367943.A18456972.A0135374.A0080362.A0084546.A0100991.A0064071.A0100858

I'm trying to split up the data in this column into separate rows in another table. The values are of variable length, and delimited by periods. I've been attempting to create a lookup table for this data, using a cursor. Due to the volume of data, the cursor is unreasonably slow.

My cursor looks like the following:

DECLARE @ptr nvarchar(160)
DECLARE @aui nvarchar(15)
DECLARE @getmrhier3 CURSOR 

SET @getmrhier3 = CURSOR FOR
    SELECT  cast(ptr as nvarchar(160)),aui
    FROM    mrhier3
    FORWARD_ONLY
OPEN @getmrhier3
FETCH NEXT
    FROM @getmrhier3 INTO @ptr, @aui

WHILE @@FETCH_STATUS = 0
BEGIN
    if(len(@ptr) > 0)
    begin
        if(charindex('.',@ptr) > 0)
        begin
            insert into mrhierlookup(hieraui,aui)
            values      (substring(@ptr,0,charindex('.',@ptr)),@aui)
            
            update  mrhier3
            set     ptr = substring(@ptr,charindex('.',@ptr)+1,LEN(@ptr))
            where   aui = @aui 
              and   ptr = @ptr
        end
        else
        begin
            insert into mrhierlookup(hieraui,aui)
            values      (@ptr,@aui)
            
            update  mrhier3
            set     ptr = ''
            where   aui = @aui 
              and   ptr = @ptr
        end
    end
    FETCH NEXT
        FROM @getmrhier3 INTO @ptr, @aui
END

CLOSE       @getmrhier3
DEALLOCATE  @getmrhier3

The current version of the cursor just works on the leading value of the column. All lengths are arbitrary. The column is at most ~150 characters long.

With the current dataset, building the lookup table will likely take days. It will have several million records.

Is there a better way to efficiently (quickly) parse out this data into a separate table for the purpose of performing join operations more quickly?

outis
  • 75,655
  • 22
  • 151
  • 221
user1003221
  • 453
  • 1
  • 5
  • 17
  • 4
    No offense, but how do you read your own code? Does it give you nightmares? Indents, carriage returns, white space and statement terminators go a LONG way. – Aaron Bertrand Apr 16 '12 at 22:22
  • Seems similar to http://stackoverflow.com/questions/314824/t-sql-opposite-to-string-concatenation-how-to-split-string-into-multiple-reco or http://stackoverflow.com/questions/2647/split-string-in-sql – xQbert Apr 16 '12 at 22:22
  • @xQbert: similar, but still different because of column orientation. – RBarryYoung Apr 16 '12 at 22:31

1 Answers1

2

Create a split function:

CREATE FUNCTION dbo.SplitStrings(@List NVARCHAR(MAX))
RETURNS TABLE
AS
   RETURN ( SELECT Item FROM
       ( SELECT Item = x.i.value('(./text())[1]', 'nvarchar(max)')
         FROM ( SELECT [XML] = CONVERT(XML, '<i>'
         + REPLACE(@List, '.', '</i><i>') + '</i>').query('.')
           ) AS a CROSS APPLY [XML].nodes('i') AS x(i) ) AS y
       WHERE Item IS NOT NULL
   );
GO

Then get rid of all the cursor and looping nonsense and do this:

INSERT dbo.mrhierlookup
(
  heiraui,
  aui
)
SELECT s.Item, m.aui
  FROM dbo.mrhier3 AS m
  CROSS APPLY dbo.SplitStrings(m.ptr) AS s
GROUP BY s.Item, m.aui;
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490