0

I am working in Databricks > Python /w Pandas. I want to restrucure the data to move from:

ID Account Phone
1234 1 4437935470
1234 1 4437935470
1234 2 4437935472
1234 2 4437935473
1235 3 4437935474
1235 4 4437935475
1236 4 4437935476
1236 4 4437935477

To:

ID Account Phone 1 Phone 2 Phone 3 Phone N etc.
1234 1 4437935470 null/empty/etc null/empty/etc
1234 2 4437935472 4437935473 null/empty/etc
1235 3 4437935474 null/empty/etc null/empty/etc
1235 4 4437935475 4437935476 4437935477

Ideally, this could all be done using Pandas Dataframe functions. I have limited experience within the data world and with Python/Pandas specifically, so I would consider other options within the context of Databricks / Python approach, if needed.

TGracchus
  • 3
  • 2

1 Answers1

0

This is enssentially pivot with two columns, except that you need to remove duplicates. Read more on this guide. Try:

(df.drop_duplicates(['ID','Account','Phone'])
   .assign(col=df.groupby(['ID','Account']).cumcount()+1)
   .set_index(['ID','Account','col'])
   ['Phone'].unstack().add_prefix('Phone ')
   .reset_index()
)

Output (note that there is a typo in your sample data vs. the expected output):

col    ID Account     Phone 1     Phone 2     Phone 3
0    1234       1  4437935470         NaN         NaN
1    1234       2  4437935472  4437935473         NaN
2    1235       3  4437935474         NaN         NaN
3    1236       4  4437935475  4437935476  4437935477
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • AttributeError: 'GroupedData' object has no attribute 'cumcount' – TGracchus Mar 23 '21 at 16:32
  • Make sure that `df` is a Pandas dataframe, not some `databricks` thingies. Sorry I'm not familiar with `databricks`, you need to figure that part out. – Quang Hoang Mar 23 '21 at 16:39
  • I ran an info on the object before executing your code modified with my headers and I get this: . Thanks for the speedy replies! – TGracchus Mar 23 '21 at 16:47