4

I have a list of software releases as versions. The software follows the semantic version specification, meaning there is a major version, a minor version and patch versions:

  • 0.1
  • 0.2
  • 0.2.1
  • 0.3
  • ...
  • 0.10
  • 0.10.1

Is there a way in pandas to sort these versions so that 0.2 is bigger than 0.1 but smaller than 0.10?

5 Answers5

7

You can use the standard distutils for this!

from distutils.version import StrictVersion
versions = ['0.1', '0.10', '0.2.1', '0.2', '0.10.1']
versions.sort(key=StrictVersion)

Now it's sorted like this: ['0.1', '0.2', '0.2.1', '0.10', '0.10.1']

Source

jarcobi889
  • 815
  • 5
  • 16
7

Pandas solution with sorted, StrictVersion solution and assign to column:

print (df)
      ver
0     0.1
1     0.2
2    0.10
3   0.2.1
4     0.3
5  0.10.1

from distutils.version import StrictVersion

df['ver'] = sorted(df['ver'], key=StrictVersion)
print (df)
      ver
0     0.1
1     0.2
2   0.2.1
3     0.3
4    0.10
5  0.10.1

EDIT:

For sort index is possible use reindex:

print (df)
        a  b
ver         
0.1     1  q
0.2     2  w
0.10    3  e
0.2.1   4  r
0.3     5  t
0.10.1  6  y

from distutils.version import StrictVersion

df = df.reindex(index=pd.Index(sorted(df.index, key=StrictVersion)))
print (df)
        a  b
0.1     1  q
0.2     2  w
0.2.1   4  r
0.3     5  t
0.10    3  e
0.10.1  6  y
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • so, it turns out i still have a problem with this: if there are other column(s), this does not also change the fields in the other column(s)- is there a way to accomplish something like sort_index with key=StrictVersion? – Muri Nicanor Jun 25 '17 at 12:19
  • Give me some time. – jezrael Jun 25 '17 at 12:20
  • Super, glad can help! – jezrael Jun 25 '17 at 19:24
  • I need to use the index solution but it returns an error, because I have `Unknown` version too and it says: ValueError: invalid version number 'Unknown' – Sergey Sergeev Aug 01 '21 at 19:31
  • `StrictVersion` unfortunately doesn't follow the Semantic Versioning spec: For example, `1.0.4a3` is considered a valid version, but `0.2.0-rc.1+e1acc3.win64` isn't, even though it falls into Semantic Versioning spec. – Marv Sep 16 '21 at 12:00
0

Those work fine if your values are unique, but here is the best solution that I've found for columns of semantic values that might have duplication.

import pandas as pd
from distutils.version import StrictVersion    

unique_sorted_versions = sorted(set(df['Version']), key=StrictVersion)

groups = [df[df['Version'].isin([version])]
          for version in unique_sorted_versions]

new_df = pd.concat(groups)
Jeff
  • 1
  • 1
0

I come across this problem too, after googling a lot (the first page I find is this SO question :D), I suppose my solution is worth to mention.

So for now there is two sort functions in pandas, sort_values and sort_index, neither of them have a key parameter for us to pass a custom sort function to it. See this github issue.

jezrael's answer is very helpful and I'll build my solution based on that.

df['ver'] = sorted(df['ver'], key=StrictVersion) is useful only if the verion column is the single column in the DataFrame, otherwise we need to sort the other columns following the version column.

jezrael reindex the DataFrame, because the wanted index order can be obtained by the buitin sorted function, who does have a key parameter.

But, what if the version is not the index and I don't want to set_index('ver')?

We can use apply to map the original version string to a StrictVersion object, then sort_values will sort in the wanted order:

from distutils.version import StrictVersion
df['ver'] = df['ver'].apply(StrictVersion)
df.sort_values(by='ver')
bigeast
  • 627
  • 5
  • 14
  • If you need to sort with mutliple columns, you will have an error message because StrictVersion in not hashable. Here is the workaround: `df["ver"] = df["ver"].apply(StrictVersion) ; df["ver_tuple"] = df["ver"].map(lambda x: vars(x)['version']) ; df.sort_values(by=['package', 'ver_tuple'], ascending=[True, False])` – nicolas.f.g Feb 22 '23 at 10:49
0

You can come up with something like that:

for module, versions in result.items():
    result[module] = sorted(
        versions, key=lambda x: mixutil.SemVersion(x.version), reverse=True
    )
kotsky
  • 17
  • 3