4

I have a Sql Server 2K8 R2 DB with a table that have a column containings multiples values, separated by (char 13 and char 10).

I'm building a script to import the data in a properly normalized schema.

My source table contains something like this :

ID   |    Value
________________
1    |    line 1
          line 2
________________
2    |    line 3
________________
3    |    line 4
          line 5
          line 6
________________

and so on.

[edit] FYI, Id is integer and value is nvarchar(3072) [/edit]

What I want is to query the table to ouput somethnig like this :

ID   |    Value
________________
1    |    line 1
________________
1    |    line 2
________________
2    |    line 3
________________
3    |    line 4
________________
3    |    line 5
________________
3    |    line 6
________________

I've read many answer here on SO, and also around the web, and I find that using master..sptvalues should be the solution. Especially, I tried to reprodude the solution of the question Split one column into multiple rows. However, without success (suspecting having two chars causing problems).

By now, I wrote this query :

SELECT
    T.ID,
    T.Value, 
    RIGHT(LEFT(T.Value,spt.Number-1),
    CHARINDEX(char(13)+char(10),REVERSE(LEFT(char(13)+char(10)+T.Value,spt.Number-1)))) as Extracted
FROM 
    master..spt_values spt,
    ContactsNew T
WHERE
    Type = 'P' AND 
    spt.Number BETWEEN 1 AND LEN(T.Value)+1
    AND
        (SUBSTRING(T.Value,spt.Number,2) = char(13)+char(10) OR SUBSTRING(T.Value,spt.Number,2) = '')

This query, unfortunately is returning :

ID   |    Value    |   Extracted
________________________________   
1    |    line 1   |   <blank>
          line 2   |   
________________________________   
1    |    line 1   |   line 2
          line 2   |   
________________________________   
2    |    line 3   |   <blank>
________________________________   
3    |    line 4   |   <blank>
          line 5   |
          line 6   |
________________________________   
3    |    line 4   |   line 5
          line 5   |   line 6
          line 6   |
________________________________   
3    |    line 4   |   line 6
          line 5   |
          line 6   |
________________________________  

<blank> is an empty string, not null string.

I'd appreciate some help to tune my query.

[Edit2] My source table contains less than 200 records, and performance is not a requirement, so I'm targeting a simple solution rather than an efficient one [Edit2]

[Edit3] The source database is readonly. I can't add stored procedure, function, or clr type. I have to do this in a single query. [Edit3]

[Edit4] Something strange... it seems that whitespaces are also considered as separators.

If I run the following query :

SELECT
    T.ID,
    replace(T.Value, '#', ' '), 
    replace(RIGHT(
        LEFT(T.Value,spt.Number-1),
        CHARINDEX( char(13) + char(10),REVERSE(LEFT(char(10) + char(13)+T.Value,spt.Number-0)))
        ), '#', ' ')
FROM 
    master..spt_values spt,
    (   
        select contactID, 
        replace(Value,' ', '#') Value
        from ContactsNew where Value is not null
    ) T
WHERE
    Type = 'P' AND 
    spt.Number BETWEEN 1 AND LEN(T.Value)+1
    AND
        (SUBSTRING(T.Value,spt.Number,2) =  char(13) + char(10) OR SUBSTRING(T.Value,spt.Number,1) = '')

I got the correct number of returns (however, still having wrong values), while running this query :

SELECT
    T.ID,
    T.Value, 
    RIGHT(
        LEFT(T.Value,spt.Number-1),
        CHARINDEX( char(13) + char(10),REVERSE(LEFT(char(10) + char(13)+T.Value,spt.Number-0)))
        )
FROM 
    master..spt_values spt,
    (   
        select contactID, 
        Value
        from ContactsNew where Value is not null
    ) T
WHERE
    Type = 'P' AND 
    spt.Number BETWEEN 1 AND LEN(T.Value)+1
    AND
        (SUBSTRING(T.Value,spt.Number,2) =  char(13) + char(10) OR SUBSTRING(T.Value,spt.Number,1) = '')

splits on spaces also

Community
  • 1
  • 1
Steve B
  • 36,818
  • 21
  • 101
  • 174

2 Answers2

1

EDIT #1: I've deleted original answer text. Try following query. I slightly modified your logic. If you should have any questions about it, don't hesitate to ask in comment. If You need another split delimiter just introduce another nested query to replace that delimiter with CHAR(13)+CHAR(10).

SELECT 
* 
FROM 
(
    SELECT
        T.ID,
        T.Value,
        CASE
            WHEN CHARINDEX(CHAR(13) + CHAR(10), SUBSTRING(T.Value, spt.number, LEN(T.Value) - spt.Number + 1)) > 0 THEN
                LEFT(
                    SUBSTRING(T.Value, spt.number, LEN(T.Value) - spt.Number + 1), 
                    CHARINDEX(CHAR(13) + CHAR(10), SUBSTRING(T.Value, spt.number, LEN(T.Value) - spt.Number + 1)) - 1)
 /* added by Steve B. see comments for the reasons */
        when len(T.Value) = spt.Number then right(t.Value, spt.number -1) 
 /* end of edit */
            ELSE
                SUBSTRING(T.Value, spt.number, LEN(T.Value) - spt.Number + 1)
        END EXTRACTED
    FROM 
        master..spt_values spt,
        ContactsNew T
    WHERE
        Type = 'P' AND 
        spt.Number BETWEEN 1 AND LEN(T.Value)+1
) X
WHERE 
    EXTRACTED <> '' AND
    (
        LEFT(X.VALUE, LEN(EXTRACTED)) = EXTRACTED OR 
        X.Value LIKE '%' + CHAR(13) + CHAR(10) + EXTRACTED + CHAR(13) + CHAR(10) + '%' OR
        X.Value LIKE '%' + CHAR(13) + CHAR(10) + EXTRACTED
    )
Steve B
  • 36,818
  • 21
  • 101
  • 174
Filip Popović
  • 2,637
  • 3
  • 18
  • 18
  • Unfortunately, I can't add a function to the source database (production server). I have to be able to do this in a single SQL Query. – Steve B Nov 22 '11 at 11:13
  • @SteveB, ok I modified my answer. – Filip Popović Nov 22 '11 at 11:57
  • Note that this will not work if the string is the full length of the column - there are only 2048 `type = 'P'` values in `spt_values` (the column is 3072) – Ed Harper Nov 22 '11 at 12:03
  • @FilipPopović: it works like a charm. Thanks very much ! I'll have to parse the query to understand how it works, but in all case, my query is up. FYI, the lengthiest value is 180 chars long... column size won't be a problem. – Steve B Nov 22 '11 at 12:21
  • @FilipPopović: I found a bug in your query. If you have a value like `S starts with and ends with an s S`, it will find that `s=s` in the part `LEFT(X.VALUE, LEN(EXTRACTED)) = EXTRACTED` (in the final where query). I have difficulties to find an solution. My values are mostly containing names, sometimes in the format `S. Foos`, and this string is found as `S. foos` AND `s` (because the first char is equals to the last char. – Steve B Nov 23 '11 at 15:50
  • Maybe I've got it : I added ` when len(T.Name) = spt.Number then right(t.Name, spt.number -1) ` within the CASE... I edit your post accordingly. Not sure to understand why, but the problem disappeared in this case – Steve B Nov 23 '11 at 16:05
  • Glad You found the solution, I've overlooked case with names with same character at the beginning and at the end. – Filip Popović Nov 23 '11 at 20:01
0

A sample query showing how to perform this kind of operation against some test data similar to described.

If you aren't able to declare variables in your final statement you can find/replace them for their values, but it makes things a bit simpler.

This works by replacing CR+LF with a single character before doing the split.
If '|' is in use in your data, select another single character which isn't to use as the temporary delimiter.

declare @crlf nvarchar(2) = char(10) + char(13)
declare @cDelim nvarchar(1) = N'|'

-- test data
declare @t table
(id int
,value nvarchar(3072))

insert @t
select 1, 'line1' + @crlf + 'line2'
union all select 2, 'line3'
union all select 3, 'line4' + @crlf + 'line5' + @crlf + 'line6'
-- /test data



;WITH charCTE
AS
( 
        --split the string into a dataset
        SELECT  D.id, D.value, SUBSTRING(D.s,n,CHARINDEX(@cDelim, D.s + @cDelim,n) -n) AS ELEMENT
        FROM (SELECT id, value, REPLACE(value,@crlf,@cDelim) as s from @t)    AS D
        JOIN (SELECT TOP 3072 ROW_NUMBER() OVER (ORDER BY a.type, a.number, a.name) AS n
              FROM master.dbo.spt_values a 
              CROSS 
              JOIN master.dbo.spt_values b 
              ) AS numsCte
        ON n <= LEN(s)
        AND SUBSTRING(@cDelim + s,n,1) = @cDelim 
)
SELECT id, ELEMENT
FROM charCTE
order by id, element
Ed Harper
  • 21,127
  • 4
  • 54
  • 80