Pandas - How to combine duplicate items into one with several columns

Question

I have the below DataFrame

As you can see, ItemNo 1 is duplicated three times, and each column has a value corresponding to it.

I am looking for a method to check against all columns, and if they match then put Price, Sales, and Stock as one entry, not three.

Any help will be greatly appreciated.

Please include a _small_ subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output for the __provided__ data. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). — Henry Ecker, May 18 '21 at 14:28
Please include any relevant information [as text directly into your question](https://stackoverflow.com/editing-help), do not link or embed external images of source code or data. Images make it difficult to efficiently assist you as they cannot be copied and offer poor usability as they cannot be searched. See: [Why not upload images of code/errors when asking a question?](https://meta.stackoverflow.com/q/285551/15497888) — Henry Ecker, May 18 '21 at 14:28

Pawan Jain · Accepted Answer · 2021-05-18T15:15:06.710

1

Simply remove all the NaN instances and redefine the column names

df = df1.apply(lambda x: pd.Series(x.dropna().values), axis=1)
df.columns = ['ItemNo','Category','SIZE','Model','Customer','Week Date','<New col name>']

For converging to one row, you can use groupby like this

df.groupby('ItemNo', as_index=False).first()

edited May 18 '21 at 15:15

answered May 18 '21 at 15:00

Pawan Jain

815
3
15

Hey Thanks for this! I am a lot closer, but with this method, I am still getting ItemNo 1, three times, and each instance has its own value. I am trying to change it so ItemNo1 has only one row, and three columns with the corresponding values – Nairda123 May 18 '21 at 15:04
1

Hey, I updated it, you can you groupby to resolve it – Pawan Jain May 18 '21 at 15:15
That is great, thank you! Is there an additional option to groupby more fields? I believe I am deleting a lot of values, ideally I would need to check all of the columns before 'Price' and if all of these match do the groupby operation :) (The dates and itemno are often the same, different customer etc) – Nairda123 May 18 '21 at 15:32
1

Yes you can `df.groupby(['ItemNo', 'Category' ,'Size',...], as_index=False).first()` for multiple columns and add more in same way :) – Pawan Jain May 18 '21 at 15:37
Life saver! Pandas is so neat, thank you dude! – Nairda123 May 18 '21 at 16:00

Pandas - How to combine duplicate items into one with several columns

1 Answers1