0

Full code sample below

Let's take the iris dataset as an example. The length of the dataframe is 150 and , the column species contain the three unique values ['setosa', 'versicolor', 'virginica']. And they appear in the dataframe column in the same order as that list. But how do I specify that order?

The original order can be found using:

# In:
df['species'].unique()

# Out:
# array(['setosa', 'versicolor', 'virginica'], dtype=object)

A reverse alphabetical order is easily applied like this:

# In:
df_alpha = df.sort_values(by='species', ascending=False)
df_alpha.unique()

# Out:
array(['virginica', 'versicolor', 'setosa'], dtype=object)

But how can I specify the order to be ['virginica', 'setosa', 'versicolor']?


Code and reproducible data:

import pandas as pd
import plotly.express as px

df = px.data.iris()
df_alpha = df.sort_values(by='species', ascending=False)
df_alpha.tail()

Structure of the dataframe:

   sepal_length  sepal_width  petal_length  petal_width species  species_id
0           5.1          3.5           1.4          0.2  setosa         1
1           4.9          3.0           1.4          0.2  setosa         1
2           4.7          3.2           1.3          0.2  setosa         1
3           4.6          3.1           1.5          0.2  setosa         1
4           5.0          3.6           1.4          0.2  setosa         1
.
.
.

145           6.7          3.0           5.2        2.3  virginica     3   
146           6.3          2.5           5.0        1.9  virginica     3
147           6.5          3.0           5.2        2.0  virginica     3
148           6.2          3.4           5.4        2.3  virginica     3
149           5.9          3.0           5.1        1.8  virginica     3
vestland
  • 55,229
  • 37
  • 187
  • 305
  • 1
    I think you need `ordered categorical`s – jezrael Jan 13 '20 at 15:13
  • @jezrael I tried the different approaches to the linked question, but I could not get them to work. Any further suggestions on how to get it to work on this dataset? I think the data sample in the linked question is too limited. – vestland Jan 13 '20 at 15:24
  • so `cats = ['virginica', 'setosa', 'versicolor'] df['species'] = pd.CategoricalIndex(df['species'], ordered=True, categories=cats) df_alpha = df.sort_values('species')` not working for you? – jezrael Jan 13 '20 at 16:14
  • 1
    @jezrael Now it did! I must have messed up something along the way. Thank you so much for your help! – vestland Jan 14 '20 at 08:32

0 Answers0