1

Each row in my DataFrame is a user vote entry for a restaurant. The data look like

id   cuisine    
91   american   
3    american   
91   american   
233  cuban      
233  cuban      
2    cuban      

where id refers to the restaurant.

I want to get something like the following

american  91   100
          3    30
          12   10
cuban     233  80
          2    33
mexican   22   99
          8    98
          21   82

where the 2nd column is the id, and the 3rd column is the number of rows in the DataFrame for that id. In other words, sort by the number of rows, but group by cuisine. I tried

g = df.groupby(['cuisine', 'id'])
c = g.size().sort_values(ascending=False)

But the order of the cuisines is mixed.

dot.Py
  • 5,007
  • 5
  • 31
  • 52
nos
  • 19,875
  • 27
  • 98
  • 134

2 Answers2

2

is that what you want?

In [2]: df
Out[2]:
    id   cuisine
0   91  american
1    3  american
2   91  american
3  233     cuban
4  233     cuban
5    2     cuban

In [3]: df.groupby(['cuisine', 'id']).size()
Out[3]:
cuisine   id
american  3      1
          91     2
cuban     2      1
          233    2
dtype: int64

or as a data frame:

In [10]: df.groupby(['cuisine', 'id']).size().reset_index(name='count').sort_values(['cuisine', 'count'], ascending=[1,0])
Out[10]:
    cuisine   id  count
1  american   91      2
0  american    3      1
3     cuban  233      2
2     cuban    2      1
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Sorry my example was too simple. On my real data, if I do what you suggested, then the count within the cuisine is not sorted. – nos Jul 19 '16 at 17:03
  • @nos, i've updated my answer - please check. Please provide a [reproducible sample and desired data sets](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) in future – MaxU - stand with Ukraine Jul 19 '16 at 17:06
  • @nos, please check [piRSquared's solution](http://stackoverflow.com/a/38464909/5741205) - it's much better – MaxU - stand with Ukraine Jul 19 '16 at 17:33
2

use value_counts after group_by followed by sort_index

# ascending=[1, 0] says True for level[0], False for level[1]
df.groupby('cuisine').id.value_counts().sort_index(ascending=[1, 0])

cuisine   id 
american  91     2
          3      1
cuban     233    2
          2      1
Name: id, dtype: int64
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • The `value_counts()` is sorted in descending order by default. Even the final sort can be saved! – nos Jul 19 '16 at 17:37