1

I have data that has come over from a hierarchical database, and it often has columns that contain data that SHOULD be in another table, if the original database had been relational.

The column's data is formatted in pairs, with LABEL\VALUE with a space as the delimiter, like this:

LABEL1\VALUE LABEL2\VALUE LABEL3\VALUE

There is seldom more than one pair in a record, but there as many as three. There are 24 different possible Labels. There are other columns in this table, including the ID. I have been able to convert this column into a sparse array without using a cursor, with columns for ID, LABEL1, LABEL2, etc....

But this is not ideal for using in another query. My other option it to use a cursor, loop through the entire table once and write to a temp table, but I can't see to get it to work the way I want. I have been able to do it in just a few minutes in VB.NET, using a couple of nested loops, but can't manage to do it in T-SQL even using cursors. Problem is, that I would have to remember to run this program every time before I want to use the table it creates. Not ideal.

So, I read a row, split out the pairs from 'LABEL1\VALUE LABEL2\VALUE LABEL3\VALUE' into an array, then split them out again, then write the rows

ID, LABEL1, VALUE

ID, LABEL2, VALUE

ID, LABEL3, VALUE

etc...

I realize that 'splitting' the strings here is the hard part for SQL to do, but it just seems a lot more difficult that it needs to be. What am I missing?

Community
  • 1
  • 1
  • Have a look at this question maybe this can help http://stackoverflow.com/questions/21211605/query-to-return-the-number-of-times-a-specific-value-occurs-within-a-string/21211941#21211941 – M.Ali Jan 21 '14 at 00:00
  • FYI for all that helped, turns out the analyst had their info about the data wrong. The column in question, which was packaging info about products was not in the format they thought. – David Griswold Jan 22 '14 at 21:33

3 Answers3

0

With only three values, you can manage to do this by brute force:

select (case when rest like '% %'
             then left(rest, charindex(' ', rest) - 1)
             else rest
        end) as val2,
       (case when rest like '% %'
             then substring(col, charindex(' ', col) + 1, 1000)
        end) as val3
from (select (case when col like '% %'
                   then left(col, charindex(' ', col) - 1)
                   else col
              end) as val1,
             (case when col like '% %'
                   then substring(col, charindex(' ', col) + 1, 1000)
              end) as rest
      from t
     ) t
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • This would have worked, but I should have said that the in the current data I have only seen three label\value pairs, but that did not preclude there being more in the future. – David Griswold Jan 21 '14 at 16:17
  • @DavidGriswold . . . The question says "there as many as three". A user-defined function or recursive CTE are the ways of doing this more generally. – Gordon Linoff Jan 21 '14 at 16:34
  • You are correct, that is what I said. I would upvote you if I could :( – David Griswold Jan 21 '14 at 17:33
0

Assuming that the data label contains no . characters, you can use a simple function for this:

CREATE FUNCTION [dbo].[SplitGriswold]
(
  @List   NVARCHAR(MAX),
  @Delim1 NCHAR(1),
  @Delim2 NCHAR(1)
)
RETURNS TABLE
AS
  RETURN
  ( 
    SELECT 
      Val1 = PARSENAME(Value,2),
      Val2 = PARSENAME(Value,1)
    FROM 
    (
      SELECT REPLACE(Value, @Delim2, '.') FROM
      ( 
        SELECT LTRIM(RTRIM(SUBSTRING(@List, [Number],
          CHARINDEX(@Delim1, @List + @Delim1, [Number]) - [Number])))
        FROM (SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
          FROM sys.all_objects) AS x
          WHERE Number <= LEN(@List)
          AND SUBSTRING(@Delim1 + @List, [Number], LEN(@Delim1)) = @Delim1
       ) AS y(Value)
     ) AS z(Value)
   );
GO

Sample usage:

DECLARE @x TABLE(ID INT, string VARCHAR(255));

INSERT @x VALUES
  (1, 'LABEL1\VALUE LABEL2\VALUE LABEL3\VALUE'),
  (2, 'LABEL1\VALUE2 LABEL2\VALUE2');

SELECT x.ID, t.val1, t.val2
FROM @x AS x CROSS APPLY 
 dbo.SplitGriswold(REPLACE(x.string, ' ', N'ŏ'), N'ŏ', '\') AS t;

(I used a Unicode character unlikely to appear in data above, only because a space can be problematic for things like length checks. If this character is likely to appear, choose a different one.)

Results:

ID   val1       val2
--   --------   --------
1    LABEL1     VALUE
1    LABEL2     VALUE
1    LABEL3     VALUE
2    LABEL1     VALUE2
2    LABEL2     VALUE2

If your data might have ., then you can just make the query a little more complex, without changing the function, by adding yet another character to the mix that is unlikely or impossible to be in the data:

DECLARE @x TABLE(ID INT, string VARCHAR(255));

INSERT @x VALUES
(1, 'LABEL1\VALUE.A LABEL2\VALUE.B LABEL3\VALUE.C'),
(2, 'LABEL1\VALUE2.A LABEL2.1\VALUE2.B');

SELECT x.ID, val1 = REPLACE(t.val1, N'ű', '.'), val2 = REPLACE(t.val2, N'ű', '.')
FROM @x AS x CROSS APPLY 
  dbo.SplitGriswold(REPLACE(REPLACE(x.string, ' ', 'ŏ'), '.', N'ű'), 'ŏ', '\') AS t;

Results:

ID   val1       val2
--   --------   --------
1    LABEL1     VALUE.A
1    LABEL2     VALUE.B
1    LABEL3     VALUE.C
2    LABEL1     VALUE2.A
2    LABEL2.1   VALUE2.B
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • This appears to be a complete solution to my question. I was able to simple replace x with my table data and change field names to match and it worked the first time. There are no possible '.' in the labels, so the first query worked, and not that it should matter, but all the labels are only two characters. – David Griswold Jan 21 '14 at 16:14
0

Using the SQL split string function given at referenced SQL tutorial, you can split the label-value pairs as following

SELECT
id, max(label) as label, max(value) as value
FROM (
SELECT 
    s.id, 
    label = case when t.id = 1 then t.val else NULL end,
    value = case when t.id = 2 then t.val else NULL end
FROM dbo.Split(N'LABEL1\VALUE1 LABEL2\VALUE2 LABEL3\VALUE3', ' ') s
CROSS APPLY dbo.Split(s.val, '\') t
) t
group by id

You can see that the split string function is called twice, first for splitting pairs from others. Then the second split function joined to previous one using CROSS APPLY splits labels from pairs

enter image description here

Eralper
  • 6,461
  • 2
  • 21
  • 27