Pandas duplicates when grouped

Question

x = df.groupby(["Customer ID", "Category"]).sum().sort_values(by="VALUE", ascending=False)

I want to group by Customer ID but when I use above code, it duplicates customers...

Here is the result:

Source DF:

  Customer ID Category  Value
0           A        x      5
1           B        y      5
2           B        z      6
3           C        x      7
4           A        z      2
5           B        x      5
6           A        x      1

new: https://ufile.io/dpruz

Could you provide a small sample data set and your desired data set? — MaxU - stand with Ukraine, Nov 30 '17 at 22:10
Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your post correspondingly. — MaxU - stand with Ukraine, Nov 30 '17 at 22:32

score 2 · Accepted Answer · answered Nov 30 '17 at 23:09

2

I think you are looking for something like this:

df_out = df.groupby(['Customer ID','Category']).sum()
df_out.reindex(df_out.sum(level=0).sort_values('Value', ascending=False).index,level=0)

Output:

                      Value
Customer ID Category       
B           x             5
            y             5
            z             6
A           x             6
            z             2
C           x             7

answered Nov 30 '17 at 23:09

Scott Boston

147,308
15
139
187

Yes Thanks. May I ask a question too please? – yigitozmen Nov 30 '17 at 23:10
Can I get first 100 Customer ID? And can you give me the route to learn dataframe well? Thank you. – yigitozmen Nov 30 '17 at 23:11
One way you could do it is to add .head(100) directly infront of .index above. For example, let say I want the top 2 from this list of three. `df_out.reindex(df_out.sum(level=0).sort_values('Value', ascending=False).head(2).index,level=0)` – Scott Boston Nov 30 '17 at 23:17
How do you learn dataframes well? Well, start answering questions on Stack Overflow. Even if the answers have been accepted, answer them yourself. You will get better overtime. Read the entire Pandas documentation and work through the examples. – Scott Boston Nov 30 '17 at 23:18
You are awesome. Thanks. – yigitozmen Nov 30 '17 at 23:21
@MaxU There has to be an easier way to sort multiindex level 0 by values. Right? – Scott Boston Nov 30 '17 at 23:23

Pandas duplicates when grouped

1 Answers1