0

I have a data source which contains data in delimited fields which exist in a staging area in SQL Server. I'd like to transform this data into many rows so it is easier to work with. This differs from the numerous other questions and answers on similar topics in that I have multiple fields where this delimited data exists. Here is an example of what my data looks like:

 ID | Field | Value
 ---+-------+------
 1  | a,b,c | 1,2,3
 2  | a,c   | 5,2

And this is the desired output:

ID | Field | Value
---+-------+------    
1  | a     | 1
1  | b     | 2
1  | c     | 3
2  | a     | 5
2  | c     | 2

My code so far uses the XML parsing method like the one mentioned here: Turning a Comma Separated string into individual rows I needed to extend it to join each field to its corresponding value which I have done by generating a row_number for each ID and then matching based on the ID and this row_number.

My issue is that it is painfully slow so I wondered if anyone has any more performant methods?

select      
    [Value].ID, [Field], [Value]
from        
    (select
         A.ID, Split.a.value('.', 'varchar(100)') as [Value],
         row_number() over (partition by ID order by Split.a) as RowNumber
     from 
         (select 
              ID, cast('<M>' + replace([Value], ',', '</M><M>') + '</M>' as xml) as [Value]
          from
              #source_table
          where
              [Field] not like '%[<>&%]%' and [Value] not like '%[<>&%]%') as A
     cross apply
         [Value].nodes ('/M') as Split(a)
    ) [Value]
inner join  
    (
        select
            A.ID, Split.a.value('.', 'varchar(100)') as [Field],
            row_number() over (partition by A.ID order by Split.a) as RowNumber
        from 
            (select
                 ID, cast('<M>' + replace([Field], ',', '</M><M>') + '</M>' as xml) as [Field]
             from  
                 #source_table
             where
                 [Field] not like '%[<>&%]%' and [Value] not like '%[<>&%]%') as A
        cross apply 
            [Field].nodes ('/M') as Split(a)
    ) [Field] on [Value].ID = [Field].ID and [Value].RowNumber = [Field].RowNumber
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

3 Answers3

1

Here is an approach using the splitter from Jeff Moden. http://www.sqlservercentral.com/articles/Tally+Table/72993/ One nice feature of that splitter is that it returns the ordinal position of each element so you can use it for joins and such.

Starting with some data.

declare @Something table
(
    ID int
    , Field varchar(50)
    , Value varchar(50)
)

insert @Something values
(1, 'a,b,c', '1,2,3')
, (2, 'a,c', '5,2')
;

Since you have two sets of delimited data you will be forced to split this for each set of delimited values. Here is how you can leverage this splitter to accomplish this.

with Fields as
(
    select *
    from @Something s
    cross apply dbo.DelimitedSplit8K(s.Field, ',') f
)
, Value as
(
    select *
    from @Something s
    cross apply dbo.DelimitedSplit8K(s.Value, ',') v
)

select f.ID
    , Field = f.Item
    , Value = v.Item
from Fields f
join Value v on v.ItemNumber = f.ItemNumber and v.ID = f.ID

If at all possible it would be best to see if you can change whatever process it is that is populating your source data so it is normalized and not delimited because it is a pain to work with.

Sean Lange
  • 33,028
  • 3
  • 25
  • 40
0

One method is a recursive CTE:

with cte as (
      select id, cast(NULL as varchar(max)) as field, cast(NULL as varchar(max)) as value, field as field_list, value as value_list, 0 as lev
      from t
      union all
      select id, left(field_list, charindex(',', field_list + ',') - 1),
             left(value_list, charindex(',', value_list + ',') - 1),
             substring(field_list, charindex(',', field_list + ',') + 1, len(field_list)),
             substring(value_list, charindex(',', value_list + ',') + 1, len(value_list)),
             1 + lev
      from cte
      where field_list <> '' and value_list <> ''
     )
select *
from cte
where lev > 0;

Here is an example of how it works.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Hm... your solution seems to be a bit buggy: it returns the IDs, but neither the fields nor the values... most certainbly because you select NULL in the first iteration and then substring of NULL in the next... – Tyron78 Oct 10 '17 at 15:31
0

Basing on @Gordon Linoff s query here another recursive cte:

DECLARE @t TABLE(
  ID int
  ,Field VARCHAR(MAX)
  ,Value VARCHAR(MAX)
)

INSERT INTO @t VALUES
(1, 'a,b,c', '1,2,3')
,(2, 'a,c', '5,2')
,(3, 'x', '7');


with cte as (
      select ID
            ,SUBSTRING(Field, 1, CASE WHEN CHARINDEX(',', Field) > 0 THEN CHARINDEX(',', Field)-1 ELSE LEN(Field) END) AS Field
            ,SUBSTRING(Value, 1, CASE WHEN CHARINDEX(',', Value) > 0 THEN CHARINDEX(',', Value)-1 ELSE LEN(Value) END) AS Value
            ,SUBSTRING(Field, CASE WHEN CHARINDEX(',', Field) > 0 THEN CHARINDEX(',', Field)+1 ELSE 1 END, LEN(Field)-CASE WHEN CHARINDEX(',', Field) > 0 THEN CHARINDEX(',', Field) ELSE 0 END) as field_list
            ,SUBSTRING(Value, CASE WHEN CHARINDEX(',', Value) > 0 THEN CHARINDEX(',', Value)+1 ELSE 1 END, LEN(Value)-CASE WHEN CHARINDEX(',', Value) > 0 THEN CHARINDEX(',', Value) ELSE 0 END) as value_list
            ,0 as lev
      from @t
      WHERE CHARINDEX(',', Field) > 0
      UNION ALL
      select ID
            ,SUBSTRING(field_list, 1, CASE WHEN CHARINDEX(',', field_list) > 0 THEN CHARINDEX(',', field_list)-1 ELSE LEN(field_list) END) AS Field
            ,SUBSTRING(value_list, 1, CASE WHEN CHARINDEX(',', value_list) > 0 THEN CHARINDEX(',', value_list)-1 ELSE LEN(value_list) END) AS Value
            ,CASE WHEN CHARINDEX(',', field_list) > 0 THEN SUBSTRING(field_list, CHARINDEX(',', field_list)+1, LEN(field_list)-CHARINDEX(',', field_list)) ELSE '' END as field_list
            ,CASE WHEN CHARINDEX(',', value_list) > 0 THEN SUBSTRING(value_list, CHARINDEX(',', value_list)+1, LEN(value_list)-CHARINDEX(',', value_list)) ELSE '' END as value_list
            ,lev + 1
      from cte
      WHERE LEN(field_list) > 0
     )
select ID, Field, Value
from cte
UNION ALL
SELECT ID, Field, Value
  FROM @t
  WHERE CHARINDEX(',', Field) = 0
ORDER BY ID, Field
OPTION (MAXRECURSION 0)
Tyron78
  • 4,117
  • 2
  • 17
  • 32