3

I need to transform the following data from long to wide format. The issue is the the group is name is potentially duplicating. Also, the number of duplicating entries is unknown so I need to account for that as well.

This is the desired result:

desired_data = [[1,'Joe Tell','111-222-3333','Al Mead','222-333-4444','Ted Shaw','444-344-2323'],       
        [2,'Don Roads','555-222-5213','','','Ted Frank','222-444-2323']]


df_result = pd.DataFrame(desired_data, columns = ['ID', 'Primary 1 - Name','Primary 1 - Phone',
                                                  'Primary 2 - Name','Primary 2 - Phone',
                                                  'Secondary 1 - Name','Secondary 1 - Phone'])

I have tried pivoting the data but it errors on the duplicate group issue. I have tried manually adding a prefix to the "Sequence" name but it is getting messy.

data = [[1,'Joe Tell','Primary','111-222-3333'], 
        [1,'Al Mead','Primary','222-333-4444'], 
        [1,'Ted Shaw','Secondary','444-344-2323'],
        [2,'Don Roads','Primary','555-222-5213'],
        [2,'Ted Frank','Secondary','222-444-2323']]

df = pd.DataFrame(data, columns = ['ID', 'Name','Sequence','Phone']) 


df.pivot(index='ID', columns='Sequence', values=['Name','Phone'])
bikerider
  • 101
  • 8

1 Answers1

0

This is a common problem, you need to create another index level for the columns with cumcount. Then pivot_table and use first for the aggregation. We can collapse the column MultiIndex.

df['idx'] = df.groupby(['ID', 'Sequence']).cumcount()+1

res = (df.pivot_table(index='ID', columns=['Sequence', 'idx'], values=['Name', 'Phone'],
                      aggfunc='first')
         .sort_index(level=[1, 2], axis=1))

res.columns = [f'{seq} {num} - {item}' for item,seq,num in res.columns]

   Primary 1 - Name Primary 1 - Phone Primary 2 - Name Primary 2 - Phone Secondary 1 - Name Secondary 1 - Phone
ID                                                                                                             
1          Joe Tell      111-222-3333          Al Mead      222-333-4444           Ted Shaw        444-344-2323
2         Don Roads      555-222-5213              NaN               NaN          Ted Frank        222-444-2323
ALollz
  • 57,915
  • 7
  • 66
  • 89