How to Deduplicate an id column by adding an overflow column?

Question

Alright guys I need some help! I am need to duplicate the ID Column and I'm having trouble with adding a column without loosing important data. Is there a way to make an "overflow column" that would take the secondary [tags] and put them into a new column?

Here is the example:

 **UniqueId**           **Age**     **Zip***              **Tag**
    1                      20          11111               yellow
    2                      25          33333                blue
    2                      25          33333               black
    3                      30          44444               purple
    3                      30          44444                pink
    3                      30          44444                white

This is what i want the output to look like

  **UniqueId**          **Age**     **Zip***    **Tag1**  **Tag2**  **Tag3**
    1                      20          11111      yellow     NULL      NULL
    2                      25          33333       blue      black     NULL
    3                      30          44444      purple     pink     white

Your help would be greatly appreciated!!!

If there are only a change of 3 duplicates, then you can use a window function and aggregation, otherwise I'd use a Dynamic Pivot. Can you specify? Note, if it's N number of duplicates then you can google dynamic pivot and find tons of examples — S3S, Apr 18 '18 at 21:17
Just some hours ago: https://stackoverflow.com/questions/49891683/how-do-you-use-transpose-the-following-table-in-sql/49893943#49893943 — Alexander Volok, Apr 18 '18 at 21:18

score 4 · Accepted Answer · answered Apr 18 '18 at 21:31

If you now the maximum number of tags, you can use pivot or conditional aggregation:

select t.uniqueid, t.age, t.zip,
       max(case when seqnum = 1 then tag end) as tag_1,
       max(case when seqnum = 2 then tag end) as tag_2,
       max(case when seqnum = 3 then tag end) as tag_2
from (select t.*,
             row_number() over (partition by uniqueid order by (select null)) as seqnum
      from t
     ) t
group by t.uniqueid, t.age, t.zip;

score 4 · Answer 2 · answered Apr 18 '18 at 21:44

Though I tend to prefer conditional aggregations as Gordon illustrated... they offer a bit more flexibilty.

You can do a simple PIVOT

Example

Select *
 From  (
        Select UniqueID
              ,Age
              ,Zip
              ,Tag
              ,Col = concat('Tag',Row_Number() over (Partition By UniqueId order by Tag) )
         From YourTable
       ) A 
 Pivot (Max(Tag) for Col in ([Tag1],[Tag2],[Tag3],[Tag4]) ) p

Returns

UniqueID    Age Zip     Tag1    Tag2    Tag3    Tag4
1           20  11111   yellow  NULL    NULL    NULL
2           25  33333   black   blue    NULL    NULL
3           30  44444   pink    purple  white   NULL

score 1 · Answer 3 · answered Apr 19 '18 at 07:26

attention: Do not store the age as an int number! Rather store a DOB and compute the age...

This is not a real answer to your question, but the thing you should rather do:

To be honest: Your question can be solved (and there are good answers already), but you should not do this.
Whenever you feel the need to add numbers to a field's name (Tag1, Tag2...) the design will be wrong (almost ever). Push these values into a related side table (just the Id and the tag), remove the column from your original table and place a foreign key pointing to the new table. Now you can join these values whenever you need them. PIVOT (or conditional aggregation) is for output only...

This is completely untested, so be careful with your data (backup!), but something along these lines should work:

CREATE TABLE TagTable (ID INT IDENTITY
                      ,FKOriginal INT NOT NULL CONSTRAINT FK_TagTable_OriginalTable FOREIGN KEY REFERENCES OriginalTable(UniqueId)
                      ,Tag VARCHAR(100) NOT NULL);

--an index to support the fk
CREATE NONCLUSTERED INDEX IX_TagTable_FKOriginal ON TagTable(FKOriginal);
GO

--shift the existing data
INSERT INTO TagTable --you might use DISTINCT...
SELECT UniqueId,Tag
FROM OriginalTable;
GO

--delete duplicated rows
WITH cte AS
(
    SELECT *
          ,ROW_NUMBER() OVER(PARTITION BY UniqueId ORDER BY UniqueId) AS RowId --Find a better sort column if needed
    FROM OriginalTable 
)
DELETE FROM cte
WHERE RowId>1; --Only the first remains
GO

--throw away the tag column in the original table
ALTER TABLE OriginalTable DROP COLUMN Tag;
GO

--See the result via JOIN-Select
SELECT *
FROM OriginalTable AS o
INNER JOIN TagTable AS t ON o.UniqueId=t.FKOriginal;

If you need these pivoted columns, you can use the approaches provided in other answers with the final SELECT too.

Completely glossed over the the AGE bit. Great eye. – John Cappelletti Apr 19 '18 at 12:11 — John Cappelletti, Apr 19 '18 at 12:11

How to Deduplicate an id column by adding an overflow column?

3 Answers3