1

I'd like to sort this (I have many more columns of different data types in the real df):

import pandas as pd
from natsort import index_natsorted
import numpy as np

data = {"version":["3.1.1","3.1.10","3.1.2","3.1.3", "4.1.6"], 
        "id":[2,2,2,2,1]}

df = pd.DataFrame(data)

df.sort_values(by=["id","version"], key=lambda x: np.argsort(index_natsorted(df["version"])), ignore_index=True)

  version  id
   3.1.1   2
   3.1.2   2
   3.1.3   2
  3.1.10   2
   4.1.6   1
BERA
  • 1,345
  • 3
  • 16
  • 36

1 Answers1

0

Use DataFrame.sort_values with multiple columns and natsort:

data = {"version":["3.1.1","3.1.10","3.1.2","3.1.3", "2.1.6"], 
        "id":[2,2,5,2,1]}

df = pd.DataFrame(data)

from natsort import index_natsorted

df = df.sort_values(by=["id", "version"],
                    key=lambda x: np.argsort(index_natsorted(zip(df['id'], df["version"]))),
                    ignore_index=True)


print (df)
  version  id
0   2.1.6   1
1   3.1.1   2
2   3.1.3   2
3  3.1.10   2
4   3.1.2   5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252