0

sample[:]

                                     rating
gender                                    F         M
title
$1,000,000 Duck (1971)             3.375000  2.761905
'Night Mother (1986)               3.388889  3.352941
'Til There Was You (1997)          2.675676  2.733333
'burbs, The (1989)                 2.793478  2.962085
...And Justice for All (1979)      3.828571  3.689024
1-900 (1994)                       2.000000  3.000000
10 Things I Hate About You (1999)  3.646552  3.311966
101 Dalmatians (1961)              3.791444  3.500000
101 Dalmatians (1996)              3.240000  2.911215
12 Angry Men (1957)                4.184397  4.328421

sample.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, $1,000,000 Duck (1971) to 12 Angry Men (1957)
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   (rating, F)  10 non-null     float64
 1   (rating, M)  10 non-null     float64
dtypes: float64(2)

I am trying to sort on the 'F' column using sample.sort_index or sample.sort_values, but I can't make either work. Is there a way to sort on just the 'F' column?

I have tried:

sample.sort_index(columns='F', ascending=False)
sample.sort_values(columns='F', ascending=False)
sample.sort_index(by='F', ascending=False)
sample.sort_values(by='F', ascending=False)

All of the above with 'F' replaced with 'rating,F'. Clearly I do not understand how to sort dataframes.

niraj
  • 17,498
  • 4
  • 33
  • 48
  • Does this answer your question? [Multi Index Sorting in Pandas](https://stackoverflow.com/questions/14733871/multi-index-sorting-in-pandas) – AMC Sep 19 '20 at 01:31

1 Answers1

0

It seems you have a MultiIndex as your column. Your first level has a single value of "rating" and your second level has your values ["F", "M"]. Regardless of sorting, in order to access a specific column within a MultiIndex you'll need to select your column using a tuple, first selecting the "rating" set of columns, then selecting either "M" or "F".

For your current data, this should work:

df.sort_values(by=("rating", "F"), ascending=False)

                                     rating          
gender                                    F         M
title                                                
12 Angry Men (1957)                4.184397  4.328421
...And Justice for All (1979)      3.828571  3.689024
101 Dalmatians (1961)              3.791444  3.500000
10 Things I Hate About You (1999)  3.646552  3.311966
'Night Mother (1986)               3.388889  3.352941
$1,000,000 Duck (1971)             3.375000  2.761905
101 Dalmatians (1996)              3.240000  2.911215
'burbs, The (1989)                 2.793478  2.962085
'Til There Was You (1997)          2.675676  2.733333
1-900 (1994)                       2.000000  3.000000

Using a tuple (instead of a list) means I want to sort by the single column ("rating", "F"). This is very different than when you use a list ["rating", "F"] for dataframe operations. Using a list declares that you have 2 separate columns named "rating" and "F" and you want to select both of them.

If the MultiIndex stuff is confusing- which it is, especially if you're new to pandas. You can fix your data to not use a MultiIndex for the column like so:

df = df["rating"]

# Once we get rid of the unnecessary first level of our multiindex
#  you can actually sort your Dataframe the way you originally thought you could
df.sort_values(by='F', ascending=False)

gender                                    F         M
title                                                
12 Angry Men (1957)                4.184397  4.328421
...And Justice for All (1979)      3.828571  3.689024
101 Dalmatians (1961)              3.791444  3.500000
10 Things I Hate About You (1999)  3.646552  3.311966
'Night Mother (1986)               3.388889  3.352941
$1,000,000 Duck (1971)             3.375000  2.761905
101 Dalmatians (1996)              3.240000  2.911215
'burbs, The (1989)                 2.793478  2.962085
'Til There Was You (1997)          2.675676  2.733333
1-900 (1994)                       2.000000  3.000000

Note how "rating" no longer appears at the top of the dataframe because we've selected it away with df = df["rating"]

Cameron Riddell
  • 10,942
  • 9
  • 19