How do I modify table data to remove duplicates and "condense" horizontally in Pandas

Question

I am working in Databricks > Python /w Pandas. I want to restrucure the data to move from:

ID	Account	Phone
1234	1	4437935470
1234	1	4437935470
1234	2	4437935472
1234	2	4437935473
1235	3	4437935474
1235	4	4437935475
1236	4	4437935476
1236	4	4437935477

To:

ID	Account	Phone 1	Phone 2	Phone 3
1234	1	4437935470	null/empty/etc	null/empty/etc
1234	2	4437935472	4437935473	null/empty/etc
1235	3	4437935474	null/empty/etc	null/empty/etc
1235	4	4437935475	4437935476	4437935477

Ideally, this could all be done using Pandas Dataframe functions. I have limited experience within the data world and with Python/Pandas specifically, so I would consider other options within the context of Databricks / Python approach, if needed.

score 0 · Accepted Answer · answered Mar 23 '21 at 16:10

0

This is enssentially pivot with two columns, except that you need to remove duplicates. Read more on this guide. Try:

(df.drop_duplicates(['ID','Account','Phone'])
   .assign(col=df.groupby(['ID','Account']).cumcount()+1)
   .set_index(['ID','Account','col'])
   ['Phone'].unstack().add_prefix('Phone ')
   .reset_index()
)

Output (note that there is a typo in your sample data vs. the expected output):

col    ID Account     Phone 1     Phone 2     Phone 3
0    1234       1  4437935470         NaN         NaN
1    1234       2  4437935472  4437935473         NaN
2    1235       3  4437935474         NaN         NaN
3    1236       4  4437935475  4437935476  4437935477

answered Mar 23 '21 at 16:10

Quang Hoang

146,074
10
56
74

AttributeError: 'GroupedData' object has no attribute 'cumcount' – TGracchus Mar 23 '21 at 16:32
Make sure that `df` is a Pandas dataframe, not some `databricks` thingies. Sorry I'm not familiar with `databricks`, you need to figure that part out. – Quang Hoang Mar 23 '21 at 16:39
I ran an info on the object before executing your code modified with my headers and I get this: . Thanks for the speedy replies! – TGracchus Mar 23 '21 at 16:47

How do I modify table data to remove duplicates and "condense" horizontally in Pandas

1 Answers1