0
x = df.groupby(["Customer ID", "Category"]).sum().sort_values(by="VALUE", ascending=False)

I want to group by Customer ID but when I use above code, it duplicates customers...

Here is the result:

Image

Source DF:

  Customer ID Category  Value
0           A        x      5
1           B        y      5
2           B        z      6
3           C        x      7
4           A        z      2
5           B        x      5
6           A        x      1

new: https://ufile.io/dpruz

yigitozmen
  • 947
  • 4
  • 23
  • 42

1 Answers1

2

I think you are looking for something like this:

df_out = df.groupby(['Customer ID','Category']).sum()
df_out.reindex(df_out.sum(level=0).sort_values('Value', ascending=False).index,level=0)

Output:

                      Value
Customer ID Category       
B           x             5
            y             5
            z             6
A           x             6
            z             2
C           x             7
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Yes Thanks. May I ask a question too please? – yigitozmen Nov 30 '17 at 23:10
  • Can I get first 100 Customer ID? And can you give me the route to learn dataframe well? Thank you. – yigitozmen Nov 30 '17 at 23:11
  • One way you could do it is to add .head(100) directly infront of .index above. For example, let say I want the top 2 from this list of three. `df_out.reindex(df_out.sum(level=0).sort_values('Value', ascending=False).head(2).index,level=0)` – Scott Boston Nov 30 '17 at 23:17
  • How do you learn dataframes well? Well, start answering questions on Stack Overflow. Even if the answers have been accepted, answer them yourself. You will get better overtime. Read the entire Pandas documentation and work through the examples. – Scott Boston Nov 30 '17 at 23:18
  • You are awesome. Thanks. – yigitozmen Nov 30 '17 at 23:21
  • @MaxU There has to be an easier way to sort multiindex level 0 by values. Right? – Scott Boston Nov 30 '17 at 23:23