2

I'm working with a legacy tables in MS SQL Server 2008 and need to create a view to display the data in a way a new system needs it. Here's the legacy table.

Table

id    userid    sport1                        sport1level               sport2                        sport2level
----------------------------------------------------------------------------------------------------------------------------------
1     11        Baseball                      Varsity                   Baseball                      Recreational
2     22        Baseball,Basketball           Varsity,Junior Varsity    Baseball                      Varsity
3     33        Soccer                        Varsity                   Soccer,Track & Field          Recreational,Intramural
4     44        null                          null                      Tennis                        Varsity
5     55        Volleyball                    Varsity                   null                          null
6     66        Baseball,Basketball           Varsity,Varsity           Soccer,Football               Varsity,Varsity
7     77        Baseball,Basketball,Rowing    Varsity,Varsity,Varsity   Soccer,Football,Volleyball    Varsity,Varsity,Recreational

This is the result we are looking for:

Result

id    userid    sport            sportlevel1       sportlevel2
---------------------------------------------------------------------------------------
1_1   11        Baseball         Varsity           Recreational
2_1   22        Baseball         Varsity           Varsity
2_2   22        Basketball       Junior Varsity    null
3_1   33        Soccer           Varsity           Recreational
3_2   33        Track & Field    null              Intramural
4_1   44        Tennis           null              Varsity
5_1   55        Volleyball       Varsity           null
6_1   66        Baseball         Varsity           null
6_2   66        Basketball       Varsity           null
6_3   66        Soccer           null              Varsity
6_4   66        Football         null              Varsity
7_1   77        Baseball         Varsity           null
7_2   77        Basketball       Varsity           null
7_3   77        Rowing           Varsity           null
7_4   77        Soccer           null              Varsity
7_5   77        Football         null              Varsity
7_6   77        Volleyball       null              Recreational

Key things to note:

  • the original table may contain more than 2 comma separated values (I added a 7th row to show this
  • id from legacy table is an int but not necessarily needed this way in new table
  • you may have noticed that the id for the new table is a concatenation of the {original id}_{incremental sport count per user}. Where {incremental sport count per user}, is a sub id if you will, for each sport chosen by a user. e.g.: userid = 2 has 2 distinct sports selected: baseball and basketball, even though baseball falls in two columns.

If I have to create helper functions or whatever, please let me know.

If you have any questions or need more info, please let me know.

Please don't try to ask why it's structured this way or try to give a better structure to the new format. Thanks

RoLYroLLs
  • 3,113
  • 4
  • 38
  • 57
  • Some hints here: http://stackoverflow.com/questions/5738834/how-to-parse-a-comma-delimited-string-of-numbers-into-a-temporary-orderid-table you can put this into a function that returns a table or insert into a temp table, etc... – Mike Feb 17 '15 at 22:16
  • I've been able to get the result with CTE, but not been able to get the `id` column the way they need it. I'd put it in here, but I want a cleaner solution as I don't like the way my colleagues wrote the CTE themselves. – RoLYroLLs Feb 17 '15 at 22:18
  • What is a CME? Do you mean cte? If so, can you share that? – Sean Lange Feb 17 '15 at 22:19
  • Rofl! yes sorry! Fixed. – RoLYroLLs Feb 17 '15 at 22:20
  • 1
    LOL. Can you share the cte? Also, could you post this as consumable ddl and data so we can help. – Sean Lange Feb 17 '15 at 22:27
  • Ok. I'll have to create something for you as there are more columns, names are different, etc., but concept is the same. – RoLYroLLs Feb 17 '15 at 22:28
  • Can you have something where a sport is repeated buth "out of order" like 'Baseball, Basketball' and then 'Football, Baseball'? – shawnt00 Feb 17 '15 at 23:36

4 Answers4

2

Not too elegant but it does the job:

WITH 
Data AS(
    SELECT *
    FROM (
    VALUES ( 1, 11, 'Baseball           ',               'Varsity               ',                 'Baseball            ', 'Recreational           ' )
    ,      ( 2, 22, 'Baseball,Basketball',               'Varsity,Junior Varsity',                 'Baseball            ', 'Varsity                ' )
    ,      ( 3, 33, 'Soccer             ',               'Varsity               ',                 'Soccer,Track & Field', 'Recreational,Intramural' )
    ,      ( 4, 44, NULL,                                NULL,                                     'Tennis              ', 'Varsity                ' )
    ,      ( 5, 55, 'Volleyball         ',               'Varsity               ',                 NULL,                   NULL                      )
    ,      ( 6, 66, 'Baseball,Basketball',               'Varsity,Varsity       ',                 'Soccer,Football     ', 'Varsity,Varsity        ' )
    ,      ( 7, 77, 'Baseball,Football,Rugby,Wrestling', 'Varsity,Varsity,Varsity,Junior Varsity', 'Rugby',                'Recreational'            )
    ) AS T(id, userid, sport1, sport1level, sport2, sport2level)
),
SplitValues AS(
    -- Substring logic is in the Anchor record ommited to prevent repetition of 
    -- code, therefor level 0 needs to be ignored
    SELECT
          id
        , userid
        , [level] = 0
        , sport1                   = sport1       
        , sport1level              = sport1level  
        , sport2                   = sport2       
        , sport2level              = sport2level  
        , sport1Remainder          = sport1       
        , sport1levelRemainder     = sport1level  
        , sport2Remainder          = sport2       
        , sport2levelRemainder     = sport2level  
    FROM data
    UNION ALL
    SELECT
           id
         , userid
         , [level] = [level] + 1
         , sport1                   = SUBSTRING(sport1Remainder, 1, ISNULL(NULLIF(CHARINDEX(',', sport1Remainder)- 1, -1), LEN(sport1Remainder)))
         , sport1level              = SUBSTRING(sport1levelRemainder, 1, ISNULL(NULLIF(CHARINDEX(',', sport1levelRemainder)- 1, -1), LEN(sport1levelRemainder)))
         , sport2                   = SUBSTRING(sport2Remainder, 1, ISNULL(NULLIF(CHARINDEX(',', sport2Remainder)- 1, -1), LEN(sport2Remainder)))
         , sport2level              = SUBSTRING(sport2levelRemainder, 1, ISNULL(NULLIF(CHARINDEX(',', sport2levelRemainder)- 1, -1), LEN(sport2levelRemainder)))
         , sport1Remainder          = SUBSTRING(sport1Remainder, NULLIF(CHARINDEX(',', sport1Remainder)+1, 1), LEN(sport1Remainder))
         , sport1levelRemainder     = SUBSTRING(sport1levelRemainder, NULLIF(CHARINDEX(',', sport1levelRemainder)+1, 1), LEN(sport1levelRemainder))
         , sport2Remainder          = SUBSTRING(sport2Remainder, NULLIF(CHARINDEX(',', sport2Remainder)+1, 1), LEN(sport2Remainder))
         , sport2levelRemainder     = SUBSTRING(sport2levelRemainder, NULLIF(CHARINDEX(',', sport2levelRemainder)+1, 1), LEN(sport2levelRemainder))
    FROM SplitValues
    WHERE sport1Remainder IS NOT NULL
        OR sport2Remainder IS NOT NULL
),
SplitRowsWithDifferentSport AS(
    SELECT id
         , userid
         , sport1
         , sport1level
         , sport1level2 = CASE WHEN sport1 = sport2 THEN sport2level END
    FROM SplitValues
    WHERE [level] <> 0

    UNION ALL

    SELECT id
         , userid
         , sport2
         , null
         , sport1level2 = sport2level
    FROM SplitValues
    WHERE ISNULL(sport1, '') <> sport2
        AND [level] <> 0
)
SELECT id           = CAST(S.id AS VARCHAR(max)) + '_' + 
                      CAST(ROW_NUMBER() OVER (PARTITION BY S.userid ORDER BY s.id) AS VARCHAR(max)) 
     , S.sport1
     , sport1level1 = MAX(S.sport1level)
     , sport1level2 = MAX(S.sport1level2)
FROM SplitRowsWithDifferentSport AS S
WHERE S.sport1 IS NOT NULL 
GROUP BY S.ID, S.userid, S.sport1
ORDER BY id

EDIT: Changed the SplitValues CTE to allow for multiple sports in a single column. A maximum of 99 sports per row is now supported. If you need to go even higher than that, add OPTION(MAXRECURSION 0) to have no limit at all.

EDIT2: Added group by to get rid of same sport on multiple rows.

MWillemse
  • 960
  • 5
  • 9
  • Wow! You are awesome. I was preparing the CTE requested by other users, but yours works great! I'll modify it to use our field names and let you know how it goes! – RoLYroLLs Feb 17 '15 at 22:58
  • You'll probably want to tweak the `row_number()` expression a little so it sorts by sport name or (column and position). It could potentially change as written now I think. – shawnt00 Feb 17 '15 at 23:44
  • @MWillemse Your solution works great with the real data. However, i do see one issue I did not show due to lack of space nor did I mention it. Each of the sports columns can have multiple sports, not a max of 2, which my data shows. I'll update my question referencing this. – RoLYroLLs Feb 18 '15 at 16:59
  • Your solution is incredibly fast! I love it! However, I found an issue with it just now as I was verifying the data. I will update my initial question if need-be, but if if you change row 7 to `( 7, 77, 'Baseball,Football,Rugby,Wrestling','Varsity,Varsity,Varsity,Junior Varsity','Rugby','Recreational' )` you will get Rugby twice. I love your solution the best since it's really fast taking only 5 seconds to process all 130,000 rows with no `where` clause. Any further help would be great! I'll try to tweak it myself as well. – RoLYroLLs Feb 20 '15 at 16:34
  • @MWillemse, Thanks! it's perfect! runs very quick (2 seconds with 130k records) – RoLYroLLs Feb 23 '15 at 20:57
1

The output you expect is not optimized, because it provides a way to produce nulls in SportLevel2. You should store each sport and each level as a separete row, for example:

nid userid  SportName   SportLevel
1_1 11  Baseball        Varsity
1_2 11  Baseball        Recreational
2_1 22  Baseball        Varsity
2_2 22  Baseball        Varsity
3_1 33  Soccer          Varsity
3_2 33  Soccer          Recreational
3_3 33  Track & Field   Recreation
4_2 44  Tennis          Varsity
5_1 55  Volleyball      Varsity
6_1 66  Baseball        Varsity
6_2 66  Soccer          Varsity
6_2 66  Basketball      Varsity
6_3 66  Football        Varsity

To achieve that, you can use CTE as follow:

DECLARE @tmp TABLE(id INT IDENTITY(1,1), userid INT , sport1 VARCHAR(150), sport1level VARCHAR(150), sport2 VARCHAR(150), sport2level VARCHAR(150))

INSERT INTO @tmp (userid, sport1, sport1level, sport2, sport2level)
VALUES(11, 'Baseball', 'Varsity', 'Baseball', 'Recreational'),
(22, 'Baseball,Basketball', 'Varsity,Junior Varsity', 'Baseball', 'Varsity'),
(33, 'Soccer', 'Varsity', 'Soccer,Track & Field', 'Recreational,Intramural'),
(44, null, null, 'Tennis', 'Varsity'),
(55, 'Volleyball', 'Varsity', null, null),
(66, 'Baseball,Basketball', 'Varsity,Varsity', 'Soccer,Football', 'Varsity,Varsity')

;WITH Sports AS
(
    --1) initial value
    --   a) no commas in sport1
    SELECT id, userid, 1 AS sportid, sport1 AS SportName, sport1level  AS SportLevel, 
        NULL AS SportNameRemainder, NULL AS SportLevelRemainder 
    FROM @tmp 
    WHERE CHARINDEX(',', sport1)=0 AND CHARINDEX(',', sport1level)=0
    UNION ALL
    --   b) no commas in sport2
    SELECT id, userid, 2 AS sportid, sport2 AS SportName, sport2level  AS SportLevel, 
        NULL AS SportNameRemainder, NULL AS SportLevelRemainder 
    FROM @tmp 
    WHERE CHARINDEX(',', sport2)=0 AND CHARINDEX(',', sport2level)=0
    UNION ALL
    --   c) commas in sport1
    SELECT id, userid, 1 AS sportid, LEFT(sport1, CHARINDEX(',', sport1)-1) AS SportName, LEFT(sport1level , CHARINDEX(',', sport1level)-1) AS SportLevel, 
        RIGHT(sport1, LEN(sport1) - CHARINDEX(',', sport1)) AS SportNameRemainder, LEFT(sport1level , LEN(sport1level) - CHARINDEX(',', sport1level)) AS SportLevelRemainder 
    FROM @tmp 
    WHERE CHARINDEX(',', sport1)>0 AND CHARINDEX(',', sport1level)>0
    UNION ALL
    --   d) commas in sport2
    SELECT id, userid, 2 AS sportid, LEFT(sport2, CHARINDEX(',', sport2)-1) AS SportName, LEFT(sport2level , CHARINDEX(',', sport2level)-1) AS SportLevel, 
        RIGHT(sport2, LEN(sport2) - CHARINDEX(',', sport2)) AS SportNameRemainder, LEFT(sport2level , LEN(sport2level) - CHARINDEX(',', sport2level)) AS SportLevelRemainder 
    FROM @tmp 
    WHERE CHARINDEX(',', sport2)>0 AND CHARINDEX(',', sport2level)>0
    UNION ALL
    --2) recursive part
    SELECT id, userid, sportid +1 AS sportid, SportNameRemainder AS SportName, SportLevelRemainder AS SportLevel, NULL AS SportNameRemainder, NULL AS SportLevelRemainder
    FROM Sports
    WHERE CHARINDEX(',', SportNameRemainder)=0 AND CHARINDEX(',', SportLevelRemainder)=0
)
SELECT CONCAT(CONVERT(VARCHAR(5), id), '_', CONVERT(VARCHAR(5), sportid)) AS nid, userid, SportName, SportLevel 
FROM Sports
ORDER BY id, userid, sportid 

Feel free to change it to your needs.

Note: I'd suggest to replace string values in SportLevel to its numeric values and does not concatenate id with SportLevel, for example: Varsity might have value 1, Recreational - 2, etc. The same logic should be used to SportName. It might be necessary to join data from 2 tables. If you need help, call ;)

Maciej Los
  • 8,468
  • 1
  • 20
  • 35
  • Cool. I'll try it out in the morning. That's how I woulda designed this particular situation :) – RoLYroLLs Feb 18 '15 at 00:06
  • 1
    Thanks. This will not work for our situation as sport1 is defined as a past sport and sport2 is defined as a future interest sport. Granted, i would have probably added a column to the new system defining which sport was which type, but I have no control. Thanks for your input though! – RoLYroLLs Feb 18 '15 at 16:03
1

try this:

WITH    Data
          AS ( SELECT   *
               FROM     ( VALUES
                        ( 1, 11, 'Baseball', 'Varsity', 'Baseball', 'Recreational'),
                        ( 2, 22, 'Baseball,Basketball', 'Varsity,Junior Varsity', 'Baseball', 'Varsity'),
                        ( 3, 33, 'Soccer', 'Varsity', 'Soccer,Track & Field', 'Recreational,Intramural'),
                        ( 4, 44, NULL , NULL , 'Tennis', 'Varsity'),
                        ( 5, 55, 'Volleyball', 'Varsity', NULL , NULL ),
                        ( 6, 66, 'Baseball,Basketball', 'Varsity,Varsity', 'Soccer,Football', 'Varsity,Varsity'),
                        ( 7, 77, 'Baseball,Basketball,Rowing', 'Varsity,Varsity,Varsity', 'Soccer,Football,Volleyball', 'Varsity,Varsity,Recreational') )
                        AS T ( id, userid, sport1, sportlevel1, sport2, sportlevel2 )
             ),
        Tally
          AS ( SELECT   n = 1
               UNION ALL
               SELECT   n + 1
               FROM     Tally
               WHERE    n <= 100
             ),
        Sprt1
          AS ( SELECT   ROW_NUMBER() OVER ( PARTITION  BY Id, userid ORDER BY Id, userid , T.n ) AS RN ,
                        id ,
                        userid ,
                        CASE WHEN CHARINDEX(',', sport1) = 0 THEN sport1
                             ELSE SUBSTRING(sport1, T.n,
                                            CHARINDEX(',',
                                                      SUBSTRING(sport1 + ',',
                                                              t.n, 100)) - 1)
                        END AS sport1
               FROM     data AS d
                        JOIN Tally AS T ON COALESCE(LEN(d.sport1), 1) >= t.n
                                           AND SUBSTRING(','
                                                         + COALESCE(d.sport1,
                                                              ''), n, 1) = ','
             ),
        lvl1
          AS ( SELECT   ROW_NUMBER() OVER ( PARTITION  BY Id, userid ORDER BY Id, userid , T.n ) AS RN ,
                        id ,
                        userid ,
                        CASE WHEN CHARINDEX(',', sportlevel1) = 0
                             THEN sportlevel1
                             ELSE SUBSTRING(sportlevel1, T.n,
                                            CHARINDEX(',',
                                                      SUBSTRING(sportlevel1
                                                              + ',', t.n, 100))
                                            - 1)
                        END AS sportlevel1
               FROM     data AS d
                        JOIN Tally AS T ON COALESCE(LEN(d.sportlevel1), 1) >= t.n
                                           AND SUBSTRING(','
                                                         + COALESCE(d.sportlevel1,
                                                              ''), n, 1) = ','
             ),
        sprt2
          AS ( SELECT   ROW_NUMBER() OVER ( PARTITION  BY Id, userid ORDER BY Id, userid , T.n ) AS RN ,
                        id ,
                        userid ,
                        CASE WHEN CHARINDEX(',', sport2) = 0 THEN sport2
                             ELSE SUBSTRING(sport2, T.n,
                                            CHARINDEX(',',
                                                      SUBSTRING(sport2 + ',',
                                                              t.n, 100)) - 1)
                        END AS sport2
               FROM     data AS d
                        JOIN Tally AS T ON COALESCE(LEN(d.sport2), 1) >= t.n
                                           AND SUBSTRING(','
                                                         + COALESCE(d.sport2,
                                                              ''), n, 1) = ','
             ),
        lvl2
          AS ( SELECT   ROW_NUMBER() OVER ( PARTITION  BY Id, userid ORDER BY Id, userid , T.n ) AS RN ,
                        id ,
                        userid ,
                        CASE WHEN CHARINDEX(',', sportlevel2) = 0
                             THEN sportlevel2
                             ELSE SUBSTRING(sportlevel2, T.n,
                                            CHARINDEX(',',
                                                      SUBSTRING(sportlevel2
                                                              + ',', t.n, 100))
                                            - 1)
                        END AS sportlevel2
               FROM     data AS d
                        JOIN Tally AS T ON COALESCE(LEN(d.sportlevel2), 1) >= t.n
                                           AND SUBSTRING(','
                                                         + COALESCE(d.sportlevel2,
                                                              ''), n, 1) = ','
             ),
        final
          AS ( SELECT   COALESCE(sprt1.RN, lvl1.RN, sprt2.rn, lvl2.rn) AS RN ,
                        COALESCE(sprt1.id, lvl1.id, sprt2.id, lvl2.id) AS id ,
                        COALESCE(sprt1.userid, lvl1.userid, sprt2.userid,
                                 lvl2.userid) AS userid ,
                        sprt1.sport1 ,
                        lvl1.sportlevel1 ,
                        sprt2.sport2 ,
                        lvl2.sportlevel2
               FROM     sprt1
                        FULL JOIN lvl1 ON sprt1.id = lvl1.id
                                          AND sprt1.RN = lvl1.RN
                        FULL JOIN sprt2 ON COALESCE(sprt1.id, lvl1.id) = sprt2.id
                                           AND COALESCE(sprt1.RN, lvl1.RN) = sprt2.RN
                        FULL JOIN lvl2 ON COALESCE(sprt1.id, lvl1.id, sprt2.id) = lvl2.id
                                          AND COALESCE(sprt1.RN, lvl1.RN,
                                                       sprt2.rn) = lvl2.RN
             )
    SELECT  CONVERT(VARCHAR, id) + '_'
            + CONVERT(VARCHAR, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY userid , part , sport )) AS id ,
            userid ,
            sport ,
            sportlevel1 ,
            sportlevel2
    FROM    ( SELECT    1 part ,
                        id ,
                        userid ,
                        sport1 AS sport ,
                        sportlevel1 ,
                        CASE WHEN sport1 = sport2 THEN sportlevel2
                             ELSE NULL
                        END sportlevel2
              FROM      final
              WHERE     sport1 IS NOT NULL
              UNION ALL
              SELECT    2 ,
                        id ,
                        userid ,
                        sport2 ,
                        NULL ,
                        sportlevel2
              FROM      final
              WHERE     sport2 IS NOT NULL
                        AND sport2 != COALESCE(sport1, '')
            ) FinalDataset
OPTION  ( MAXRECURSION 1000 )

output

enter image description here

code for stored procedure

--Create demo temp table for testing
IF OBJECT_ID('Tempdb..#Data') IS NOT NULL 
    DROP TABLE #Data
SELECT  *
INTO    #Data
FROM    ( VALUES ( 1, 11, 'Baseball', 'Varsity', 'Baseball', 'Recreational'),
        ( 2, 22, 'Baseball,Basketball', 'Varsity,Junior Varsity', 'Baseball', 'Varsity'),
        ( 3, 33, 'Soccer', 'Varsity', 'Soccer,Track & Field', 'Recreational,Intramural'),
        ( 4, 44, NULL , NULL , 'Tennis', 'Varsity'),
        ( 5, 55, 'Volleyball', 'Varsity', NULL , NULL ),
        ( 6, 66, 'Baseball,Basketball', 'Varsity,Varsity', 'Soccer,Football', 'Varsity,Varsity'),
        ( 7, 77, 'Baseball,Basketball,Rowing', 'Varsity,Varsity,Varsity', 'Soccer,Football,Volleyball', 'Varsity,Varsity,Recreational') )
        AS T ( id, userid, sport1, sportlevel1, sport2, sportlevel2 );

----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
---      CODE BELOW CAN BE USED THRO STORED PROCEDURE, VIEW DOES NOT ALLOW TEMP TABLE USAGE      ---
----------------------------------------------------------------------------------------------------
--Create temp table with sequence num
IF OBJECT_ID('Tempdb..#Tally') IS NOT NULL --<<<~this code not required in Stored procedure, can be deleted
    DROP TABLE #Tally --<<<~this code not required in Stored procedure, can be deleted
CREATE TABLE #Tally ( N INT PRIMARY KEY )
DECLARE @i INT = 1
WHILE @i < 1000 
    BEGIN
        INSERT  INTO #Tally
                SELECT  @i
        SET @i = @i + 1
    END      
----------------------------------------------------------------------------------------------------
--split sport2 field and create temp table for final result  
IF OBJECT_ID('Tempdb..#sprt1') IS NOT NULL 
    DROP TABLE #sprt1         
SELECT  ROW_NUMBER() OVER ( PARTITION  BY Id, userid ORDER BY Id, userid , T.n ) AS RN ,
        id ,
        userid ,
        CASE WHEN CHARINDEX(',', sport1) = 0 THEN sport1
             ELSE SUBSTRING(sport1, T.n,
                            CHARINDEX(',', SUBSTRING(sport1 + ',', t.n, 100))
                            - 1)
        END AS sport1
INTO    #sprt1
FROM    #data AS d
        JOIN #Tally AS T ON COALESCE(LEN(d.sport1), 1) >= t.n
                            AND SUBSTRING(',' + COALESCE(d.sport1, ''), n, 1) = ','
----------------------------------------------------------------------------------------------------
--split sportlevel1 field and create temp table for final result         
IF OBJECT_ID('Tempdb..#lvl1') IS NOT NULL 
    DROP TABLE #lvl1      
SELECT  ROW_NUMBER() OVER ( PARTITION  BY Id, userid ORDER BY Id, userid , T.n ) AS RN ,
        id ,
        userid ,
        CASE WHEN CHARINDEX(',', sportlevel1) = 0 THEN sportlevel1
             ELSE SUBSTRING(sportlevel1, T.n,
                            CHARINDEX(',',
                                      SUBSTRING(sportlevel1 + ',', t.n, 100))
                            - 1)
        END AS sportlevel1
INTO    #lvl1
FROM    #data AS d
        JOIN #Tally AS T ON COALESCE(LEN(d.sportlevel1), 1) >= t.n
                            AND SUBSTRING(',' + COALESCE(d.sportlevel1, ''), n,
                                          1) = ','
----------------------------------------------------------------------------------------------------
--split sport2 field and create temp table for final result           
IF OBJECT_ID('Tempdb..#sprt2') IS NOT NULL 
    DROP TABLE #sprt2
SELECT  ROW_NUMBER() OVER ( PARTITION  BY Id, userid ORDER BY Id, userid , T.n ) AS RN ,
        id ,
        userid ,
        CASE WHEN CHARINDEX(',', sport2) = 0 THEN sport2
             ELSE SUBSTRING(sport2, T.n,
                            CHARINDEX(',', SUBSTRING(sport2 + ',', t.n, 100))
                            - 1)
        END AS sport2
INTO    #sprt2
FROM    #data AS d
        JOIN #Tally AS T ON COALESCE(LEN(d.sport2), 1) >= t.n
                            AND SUBSTRING(',' + COALESCE(d.sport2, ''), n, 1) = ','
            
----------------------------------------------------------------------------------------------------
--split sportlevel2 field and create temp table for final result      
IF OBJECT_ID('Tempdb..#lvl2') IS NOT NULL 
    DROP TABLE #lvl2
SELECT  ROW_NUMBER() OVER ( PARTITION  BY Id, userid ORDER BY Id, userid , T.n ) AS RN ,
        id ,
        userid ,
        CASE WHEN CHARINDEX(',', sportlevel2) = 0 THEN sportlevel2
             ELSE SUBSTRING(sportlevel2, T.n,
                            CHARINDEX(',',
                                      SUBSTRING(sportlevel2 + ',', t.n, 100))
                            - 1)
        END AS sportlevel2
INTO    #lvl2
FROM    #data AS d
        JOIN #Tally AS T ON COALESCE(LEN(d.sportlevel2), 1) >= t.n
                            AND SUBSTRING(',' + COALESCE(d.sportlevel2, ''), n,
                                          1) = ','
 
----------------------------------------------------------------------------------------------------
--final data set          
IF OBJECT_ID('Tempdb..#Final') IS NOT NULL 
    DROP TABLE #Final
SELECT  COALESCE(S1.RN, L1.RN, S2.rn, L2.rn) AS RN ,
        COALESCE(S1.id, L1.id, S2.id, L2.id) AS id ,
        COALESCE(S1.userid, L1.userid, S2.userid, L2.userid) AS userid ,
        S1.sport1 ,
        L1.sportlevel1 ,
        S2.sport2 ,
        L2.sportlevel2
INTO    #Final
FROM    #sprt1 AS S1
        FULL JOIN #lvl1 AS L1 ON S1.id = L1.id
                                 AND S1.RN = L1.RN
        FULL JOIN #sprt2 AS S2 ON COALESCE(S1.id, L1.id) = S2.id
                                  AND COALESCE(S1.RN, L1.RN) = S2.RN
        FULL JOIN #lvl2 AS L2 ON COALESCE(S1.id, L1.id, S2.id) = L2.id
                                 AND COALESCE(S1.RN, L1.RN, S2.rn) = L2.RN
----------------------------------------------------------------------------------------------------
--Final output query
SELECT  CONVERT(VARCHAR, id) + '_'
        + CONVERT(VARCHAR, ROW_NUMBER() OVER ( PARTITION BY id ORDER BY userid , part , sport )) AS id ,
        userid ,
        sport ,
        sportlevel1 ,
        sportlevel2
FROM    ( SELECT    1 part ,
                    id ,
                    userid ,
                    sport1 AS sport ,
                    sportlevel1 ,
                    CASE WHEN sport1 = sport2 THEN sportlevel2
                         ELSE NULL
                    END sportlevel2
          FROM      #final
          WHERE     sport1 IS NOT NULL
          UNION ALL
          SELECT    2 ,
                    id ,
                    userid ,
                    sport2 ,
                    NULL ,
                    sportlevel2
          FROM      #final
          WHERE     sport2 IS NOT NULL
                    AND sport2 != COALESCE(sport1, '')
        ) FinalDataset
 
Community
  • 1
  • 1
Vasily
  • 5,707
  • 3
  • 19
  • 34
  • Thank you so much. both solutions you propose work well! I will test it with my data. – RoLYroLLs Feb 18 '15 at 16:22
  • Your solution works great with the real data. However, i do see one issue I did not show due to lack of space nor did I mention it. Each of the sports columns can have multiple sports, not a max of 2, which my data shows. I'll update my question referencing this. – RoLYroLLs Feb 18 '15 at 17:02
  • @RoLYroLLs, take a look in my updated answer, also be informed that result in your post has wrong logic with 77 userid. Currently, code allows 10 commas, but you can change this value. if you need more than 100, you must use OPTION (MAXRECURSION 1000) at the end of the query, because by default recursion can not be greater than 100. – Vasily Feb 19 '15 at 04:44
  • thanks I updated the logic, I'll try out your updated solution with my real data and let you know. – RoLYroLLs Feb 19 '15 at 14:56
  • This is working great with our data. However, for 42 rows it takes 1 minute and 40 seconds to process and return 163 rows. Is there a way this can be optimized? – RoLYroLLs Feb 19 '15 at 16:14
  • @RoLYroLLs I just checked, 200 lines were processed in less than one second, and this is by using sql express, so the performance must be much higher on real server. So, problem must be in another area, but I will post another variants anyway, – Vasily Feb 19 '15 at 17:58
  • That will be awesome! I will be waiting. – RoLYroLLs Feb 19 '15 at 21:08
  • BTW: the table contains 130,000 rows. However, I have a `where` clause at the `part1` cte to restrict the selection base on date range. That returns 42 records, which takes the ~1 minute result. Is there a better way to optimize the query? should I be having the `where` clause elsewhere? Thanks! – RoLYroLLs Feb 19 '15 at 21:28
  • @RoLYroLLs, I think in this case you need strored procedure, post updated – Vasily Feb 21 '15 at 02:05
  • thanks. Works well, but another user produced code that didn't require extra things to be added. wish there was a way to award you partially for all your help! – RoLYroLLs Feb 23 '15 at 20:56
0

The following method uses CTEs, but they aren't recursive. The output is exactly what you want and can handle any number of values in the CSV fields. I used a SQLCLR-based string splitter that is freely available in the SQL# library (which I wrote, but the 2 String_Split functions are in the Free version). You can use another splitter if you prefer, but just make sure it isn't a scalar T-SQL UDF.

Please note that for the "new" ID field, rather than turning an efficient INT into a VARCHAR so that it can be concatenated with the unique sport # for that UserID, I turned it into a DECIMAL and put the unique sport # to the right of the decimal place. This should filter and sort faster than a string field. If someone can ever have more than 9 sports, then changing the / 10.0 into / 100.0.

SET NOCOUNT ON;

DECLARE @SampleData TABLE
(
  Id INT NOT NULL,
  UserID INT NOT NULL,
  Sport1 VARCHAR(500),
  SportLevel1 VARCHAR(500),
  Sport2 VARCHAR(500),
  SportLevel2 VARCHAR(500)
);

INSERT INTO @SampleData VALUES (1, 11, 'Baseball', 'Varsity', 'Baseball', 'Recreational');
INSERT INTO @SampleData VALUES (2, 22, 'Baseball,Basketball', 'Varsity,Junior Varsity', 'Baseball', 'Varsity');
INSERT INTO @SampleData VALUES (3, 33, 'Soccer', 'Varsity', 'Soccer,Track & Field', 'Recreational,Intramural');
INSERT INTO @SampleData VALUES (4, 44, NULL , NULL , 'Tennis', 'Varsity');
INSERT INTO @SampleData VALUES (5, 55, 'Volleyball', 'Varsity', NULL , NULL);
INSERT INTO @SampleData VALUES (6, 66, 'Baseball,Basketball', 'Varsity,Varsity', 'Soccer,Football', 'Varsity,Varsity');
INSERT INTO @SampleData VALUES (7, 77, 'Baseball,Basketball,Rowing', 'Varsity,Varsity,Varsity', 'Soccer,Football,Volleyball', 'Varsity,Varsity,Recreational');
---------------------------------------------------------------

;WITH s1 AS
(
    SELECT d1.Id, d1.UserID, v1.SplitNum, v1.SplitVal AS [Sport]
    FROM   @SampleData d1
    CROSS APPLY SQL#.String_Split4k(d1.Sport1, N',', 1) v1
), sl1 AS
(
    SELECT d2.Id, d2.UserID, v2.SplitNum, v2.SplitVal AS [SportLevel1]
    FROM   @SampleData d2
    CROSS APPLY SQL#.String_Split4k(d2.SportLevel1, N',', 1) v2
), s2 AS
(
    SELECT d3.Id, d3.UserID, v3.SplitNum, v3.SplitVal AS [Sport]
    FROM   @SampleData d3
    CROSS APPLY SQL#.String_Split4k(d3.Sport2, N',', 1) v3
), sl2 AS
(
    SELECT d4.Id, d4.UserID, v4.SplitNum, v4.SplitVal AS [SportLevel2]
    FROM   @SampleData d4
    CROSS APPLY SQL#.String_Split4k(d4.SportLevel2, N',', 1) v4
)
SELECT COALESCE(s1.UserID, s2.UserID) +
          (ROW_NUMBER() OVER (PARTITION BY COALESCE(s1.UserID, s2.UserID)
                              ORDER BY CASE
                                          WHEN sl1.SportLevel1 IS NOT NULL
                                              AND sl2.SportLevel2 IS NOT NULL THEN 1
                                          WHEN sl1.SportLevel1 IS NOT NULL THEN 2
                                          ELSE 3
                                       END ASC,
                                       COALESCE(s1.Sport, s2.Sport) ASC
                             ) / 10.0
          ) AS [UserID],
       COALESCE(s1.Sport, s2.Sport) AS [Sport],
       sl1.SportLevel1,
       sl2.SportLevel2
FROM   s1
INNER JOIN sl1
        ON sl1.Id = s1.Id
        AND sl1.SplitNum = s1.SplitNum
FULL OUTER JOIN (
            s2
    INNER JOIN sl2
            ON sl2.Id = s2.Id
            AND sl2.SplitNum = s2.SplitNum
        )
        ON s2.Id = s1.Id
        AND s2.Sport = s1.Sport
ORDER BY COALESCE(s1.UserID, s2.UserID) ASC,
         CASE WHEN sl1.SportLevel1 IS NOT NULL THEN 1 ELSE 2 END ASC,
         COALESCE(s1.Sport, s2.Sport) ASC;
Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
  • thanks for your input another user gave a great solution without adding other things. Thanks so much though! – RoLYroLLs Feb 23 '15 at 20:56