For duplicate values of multiple columns, create new columns with unique

Question

I have a dataset that looks like this in SQL

Col1     Col2     Col3     Col4
   A       re        T     huge
   A       re        T    small
   B       re        T     huge
   C       re        T     huge
   C       re        T   medium
   C       re        T    small

For any duplicates in the first 3 columns, I want to create new columns next to it with each unique value from Col4. My ultimate goal is to join this to another dataset and I want to include all the information.

I want the final dataset to look like this:

Col1     Col2     Col3     Col4    Col5    Col6 ....
   A       re        T     huge   small    NULL
   B       re        T     huge    NULL    NULL
   C       re        T     huge  medium   small

Why not just count the number of huge, medium, and small? An unlimited number of additional columns seems unnecessary. — Gordon Linoff, Jul 16 '18 at 15:23
Because my goal is to join by col1, col2, col3 to another dataset and display the results from Col4 from the original dataset. If I do a count then I would be losing those unique values which I want in my final dataset. If I don't have those unique values in one row then additional rows will be included in the final dataset which will alter the performance numbers — nak5120, Jul 16 '18 at 15:27
This rearrangement of the data is not the best way to solve your actual problem. You should ask another question with more information about what you really want to do. — Gordon Linoff, Jul 16 '18 at 16:11

score 0 · Answer 1 · answered Jul 16 '18 at 15:37

You can do this with a CTE that generates a ROW_NUMBER() over partition by the first 3 columns, and ordered by whatever logic you want to use.

Then you select from that CTE grouping by the first three columns and doing a sub-select for each additional column getting the associated RowNumber=1 for the first additional column, RowNumber=2 for the second additional column, etc.

Alternative to subselects, you could use MAX(CASE) expressions instead.

Note that if you don't know how many additional columns there will be, then you will have to do this with dynamic sql.

score 0 · Answer 2 · answered Jul 16 '18 at 15:44

I think this is what you want

SELECT AA.col1, 
       AA.col2, 
       AA.col3, 
       small.col4  AS col4, 
       huge.col4   AS col5, 
       medium.col4 AS col6 
FROM   (SELECT col1, 
               col2, 
               col3 
        FROM   your_table 
        GROUP  BY col1, 
                  col2, 
                  col3)AA 
       LEFT JOIN (SELECT col1, 
                         col2, 
                         col3, 
                         col4 
                  FROM   your_table 
                  WHERE  col4 = small)small 
              ON ( small.col1 = AA.col1 
                   AND small.col2 = AA.col2 
                   AND small.col3 = AA.col3 ) 
       LEFT JOIN (SELECT col1, 
                         col2, 
                         col3, 
                         col4 
                  FROM   your_table 
                  WHERE  col4 = huge)huge 
              ON ( huge.col1 = AA.col1 
                   AND huge.col2 = AA.col2 
                   AND huge.col3 = AA.col3 ) 
       LEFT JOIN (SELECT col1, 
                         col2, 
                         col3, 
                         col4 
                  FROM   your_table 
                  WHERE  col4 = medium)medium 
              ON ( medium.col1 = AA.col1 
                   AND medium.col2 = AA.col2 
                   AND medium.col3 = AA.col3 )

Thanks, I think this only works if you know the set number of duplicates. Also, if there are 10 duplicates, would I have to create 10 left joins with this logic? — nak5120, Jul 16 '18 at 16:30
if you are looking for dynamic solution then go through this https://stackoverflow.com/questions/10404348/sql-server-dynamic-pivot-query — kiran gadhe, Jul 17 '18 at 06:28

For duplicate values of multiple columns, create new columns with unique

2 Answers2