-2

Quick question. I have a huge csv (100000+ rows) that contains version data (e.g. Python 3.7.3) and I'm trying to compare them to figure out how recent they are. Disregarding regular expressions and extensive list comprehensions, is there a clean and pythonic way to convert strings such as 3.7.3 to a float? For matters of speed and readability I would prefer one or two liners.

Thanks in advance.

Yellocat
  • 111
  • 4
  • 1
    Convert 3.7.3 to what float? – khelwood Jan 11 '21 at 09:17
  • To a number. so that I can use max() on a list containing it. – Yellocat Jan 11 '21 at 09:19
  • 1
    You don't want to convert 3.7.3 to a float, you want to convert it to something like a 3-tuple so you can compare them to each other, e.g. `tuple(map(int, "3.7.3".split(".")))` – Samwise Jan 11 '21 at 09:19
  • But **what** number? 373.0? 37.3? 3.73? 0.373? And then how will you distinguish 3.7.3 from 37.3.0 (if we ever hit that version) or from 3.37.0? – Mike Scotty Jan 11 '21 at 09:19
  • @MikeScotty 3.73 would be ideal :) – Yellocat Jan 11 '21 at 09:20
  • 2
    What if you had a version number like 3.20.1? Would you want that to be rendered as 3.201, which would be considered less than 3.73? – Samwise Jan 11 '21 at 09:22
  • 1
    Personally, I'd suggest to stick with the version string as-is (for reasons pointed out by me and by @Samwise) and read though the ansers here: [How do I compare version numbers in Python?](https://stackoverflow.com/q/11887762/4349415) – Mike Scotty Jan 11 '21 at 09:23
  • Thanks for the enlightenment @Samwise!! Also, tuples sound like a brilliant idea. I'll try that. – Yellocat Jan 11 '21 at 09:24

1 Answers1

6

Converting a three-dimensional version string to a single float won't give you numbers that you can compare meaningfully in all cases. Consider the versions:

3.15.1
3.7.4
3.7.3
2.20.5

If you simply ignore the second period, you end up with:

3.151
3.74
3.73
2.205

which breaks the ordering.

If you represent these as tuples, e.g.:

(3, 15, 1)
(3, 7, 4)
(3, 7, 3)
(2, 20, 5)

they'll sort correctly, as shown here:

>>> versions = ["3.7.3", "3.15.1", "3.7.4", "2.20.5"]
>>> [tuple(map(int, v.split("."))) for v in versions]
[(3, 7, 3), (3, 15, 1), (3, 7, 4), (2, 20, 5)]
>>> max(tuple(map(int, v.split("."))) for v in versions)
(3, 15, 1)

and if you want to turn it back into a string at the end:

>>> ".".join(map(str, max(tuple(map(int, v.split("."))) for v in versions)))
'3.15.1'

or use the key argument to max to simply do the conversion in-place at the time the comparisons are made:

>>> max(versions, key=lambda v: tuple(map(int, v.split("."))))
'3.15.1'
Samwise
  • 68,105
  • 3
  • 30
  • 44