python pandas sorting by group

Question

Each row in my DataFrame is a user vote entry for a restaurant. The data look like

id   cuisine    
91   american   
3    american   
91   american   
233  cuban      
233  cuban      
2    cuban

where id refers to the restaurant.

I want to get something like the following

american  91   100
          3    30
          12   10
cuban     233  80
          2    33
mexican   22   99
          8    98
          21   82

where the 2nd column is the id, and the 3rd column is the number of rows in the DataFrame for that id. In other words, sort by the number of rows, but group by cuisine. I tried

g = df.groupby(['cuisine', 'id'])
c = g.size().sort_values(ascending=False)

But the order of the cuisines is mixed.

take a look at pivot tables : http://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html — dot.Py, Jul 19 '16 at 16:48

MaxU - stand with Ukraine · Answer 1 · 2016-07-19T17:06:26.997

2

is that what you want?

In [2]: df
Out[2]:
    id   cuisine
0   91  american
1    3  american
2   91  american
3  233     cuban
4  233     cuban
5    2     cuban

In [3]: df.groupby(['cuisine', 'id']).size()
Out[3]:
cuisine   id
american  3      1
          91     2
cuban     2      1
          233    2
dtype: int64

or as a data frame:

In [10]: df.groupby(['cuisine', 'id']).size().reset_index(name='count').sort_values(['cuisine', 'count'], ascending=[1,0])
Out[10]:
    cuisine   id  count
1  american   91      2
0  american    3      1
3     cuban  233      2
2     cuban    2      1

edited Jul 19 '16 at 17:06

answered Jul 19 '16 at 16:56

MaxU - stand with Ukraine

205,989
36
386
419

Sorry my example was too simple. On my real data, if I do what you suggested, then the count within the cuisine is not sorted. – nos Jul 19 '16 at 17:03
@nos, i've updated my answer - please check. Please provide a [reproducible sample and desired data sets](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) in future – MaxU - stand with Ukraine Jul 19 '16 at 17:06
@nos, please check [piRSquared's solution](http://stackoverflow.com/a/38464909/5741205) - it's much better – MaxU - stand with Ukraine Jul 19 '16 at 17:33

score 2 · Accepted Answer · answered Jul 19 '16 at 17:27

2

use value_counts after group_by followed by sort_index

# ascending=[1, 0] says True for level[0], False for level[1]
df.groupby('cuisine').id.value_counts().sort_index(ascending=[1, 0])

cuisine   id 
american  91     2
          3      1
cuban     233    2
          2      1
Name: id, dtype: int64

answered Jul 19 '16 at 17:27

piRSquared

285,575
57
475
624

The `value_counts()` is sorted in descending order by default. Even the final sort can be saved! – nos Jul 19 '16 at 17:37

python pandas sorting by group

2 Answers2