2

I have a pandas DataFrame like below:

df = pd.DataFrame({["id": [1, 2, 3, 4, 5],
                    "fruit": ["apple", "banana", "pineapple", "orange", "orange"],
                    "trash": [38, 22, 93, 1, 15]})

Now I want to sort the rows of this DataFrame on column fruit, according to a provided ordered list. Lets say this list is:

ordered_list = ["pinapple", "banana", "orange", "apple"]

I would like to produce the following output (I don't care about the order of rows with the same fruit value, so the orange rows might be reversed):

id        fruit        trash
3         pineapple    93
2         banana       22
4         orange       1
5         orange       15
1         apple        38

How to solve this? I know I could get around with some for-loops, but it would surprise me if there isn't a more elegant solution (which I'm not able to find). The solution should be as time-efficient as possible, since the real DataFrame consists of about 5000 rows (with around 50 unique fruit values)

Peter
  • 722
  • 6
  • 24

1 Answers1

7

Approach 1

Convert the Fruit column to ordered categorical type and sort the values

df['fruit'] = pd.Categorical(df['fruit'], ordered_list, ordered=True)
df.sort_values('fruit')

Approach 2

Sort the values by passing a key function, which maps the fruit names to there corresponding order

df.sort_values('fruit', key=lambda x: x.map({v:k for k, v in enumerate(ordered_list)}))

   id      fruit  trash
2   3  pineapple     93
1   2     banana     22
3   4     orange      1
4   5     orange     15
0   1      apple     38
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53