The data set I'm working with is here. It's a data set of Chipotle orders. There are item_name
for each type of item sold, and then within item_name
there are multiple possible choice_description
s describing exactly what is in the item.
So let's import the file first and clean it up a bit:
chipo = pd.read_csv('chipotle.tsv', sep='\t')
del chipo['order_id']
del chipo['quantity']
chipo['revenue'] = chipo['item_price'].str.replace('$','').astype(float)
chipo['count'] = 1
I'm trying to tabulate how many of each choice_description
is ordered per team item_name
. This is easy enough by using a groupby and then summing the columns (including the column of ones),
grouped = chipo.groupby(['item_name','choice_description']).sum()
but now I want to sort the counts within 'item_name'
. Oh, also I want to keep the item_price
column for possible later calculation, so I can't just convert the DataFrame to a one-column frame. For the life of me I can't figure out how to do this.
I can't just call grouped.sort_values('count')
because it sorts over all observations, not just within the item_name
groups. How can I just sort by that column but keep the sorting constrained within the group?
Even though there seem to previous answers discussing this (e.g. here), all of them either use deprecated sort commands (e.g. Series.order()
or DataFrame.sort()
) or don't do exactly what I need. How can I perform this seemingly simple task?