Long to Wide Transformation with Duplicate Group Names

Question

I need to transform the following data from long to wide format. The issue is the the group is name is potentially duplicating. Also, the number of duplicating entries is unknown so I need to account for that as well.

This is the desired result:

desired_data = [[1,'Joe Tell','111-222-3333','Al Mead','222-333-4444','Ted Shaw','444-344-2323'],       
        [2,'Don Roads','555-222-5213','','','Ted Frank','222-444-2323']]


df_result = pd.DataFrame(desired_data, columns = ['ID', 'Primary 1 - Name','Primary 1 - Phone',
                                                  'Primary 2 - Name','Primary 2 - Phone',
                                                  'Secondary 1 - Name','Secondary 1 - Phone'])

I have tried pivoting the data but it errors on the duplicate group issue. I have tried manually adding a prefix to the "Sequence" name but it is getting messy.

data = [[1,'Joe Tell','Primary','111-222-3333'], 
        [1,'Al Mead','Primary','222-333-4444'], 
        [1,'Ted Shaw','Secondary','444-344-2323'],
        [2,'Don Roads','Primary','555-222-5213'],
        [2,'Ted Frank','Secondary','222-444-2323']]

df = pd.DataFrame(data, columns = ['ID', 'Name','Sequence','Phone']) 


df.pivot(index='ID', columns='Sequence', values=['Name','Phone'])

The main logic is answered in Question 10 of the duplicate. I added a wiki here to show you exactly how to implement that idea, given you provided such a complete example. — ALollz, Aug 27 '19 at 18:28

score 0 · Accepted Answer · answered Aug 27 '19 at 18:26

This is a common problem, you need to create another index level for the columns with cumcount. Then pivot_table and use first for the aggregation. We can collapse the column MultiIndex.

df['idx'] = df.groupby(['ID', 'Sequence']).cumcount()+1

res = (df.pivot_table(index='ID', columns=['Sequence', 'idx'], values=['Name', 'Phone'],
                      aggfunc='first')
         .sort_index(level=[1, 2], axis=1))

res.columns = [f'{seq} {num} - {item}' for item,seq,num in res.columns]

   Primary 1 - Name Primary 1 - Phone Primary 2 - Name Primary 2 - Phone Secondary 1 - Name Secondary 1 - Phone
ID                                                                                                             
1          Joe Tell      111-222-3333          Al Mead      222-333-4444           Ted Shaw        444-344-2323
2         Don Roads      555-222-5213              NaN               NaN          Ted Frank        222-444-2323

Thank you so much. The part I needed was the group by cumcount section! — bikerider, Aug 27 '19 at 20:34

Long to Wide Transformation with Duplicate Group Names

1 Answers1

Related