0

I have the following table in pandas (notice how the item repeats for each warehouse)

id Item Warehouse Price Cost
1 Cake US: California 30 20
1 Cake US: Chicago 30 20
2 Meat US: California 40 10
2 Meat US: Chicago 40 10

And I need to add each warehouse as a separate column like this:

id Item Warehouse 1 Warehouse 2 Price Cost
1 Cake US: California US: Chicago 30 20
2 Meat US: California US: Chicago 40 10

Data:

{'id': [1, 1, 2, 2],
 'Item': ['Cake', 'Cake', 'Meat', 'Meat'],
 'Warehouse': ['US: California',
  'US: Chicago',
  'US: California',
  'US: Chicago'],
 'Price': [30, 30, 40, 40],
 'Cost': [20, 20, 10, 10]}

1 Answers1

1

You could assign a number to each warehouse for each id using groupby + cumcount; then pivot:

out = (df.assign(col_idx=df.groupby('Item').cumcount().add(1))
       .pivot(['id', 'Item', 'Price', 'Cost'], 'col_idx', 'Warehouse')
       .add_prefix('Warehouse ').reset_index().rename_axis(columns=[None]))

or you could use groupby + agg(list); then construct a DataFrame with the Warehouse column and join:

out = df.groupby(['id', 'Item', 'Price', 'Cost']).agg(list).reset_index()
out = (out.drop(columns='Warehouse')
       .join(pd.DataFrame(out['Warehouse'].tolist(), columns=['Warehouse 1', 'Warehouse 2'])))

Output:

   id  Item  Price  Cost     Warehouse 1  Warehouse 2
0   1  Cake     30    20  US: California  US: Chicago
1   2  Meat     40    10  US: California  US: Chicago