Sort dictionary by key numeric with alphanumeric data

Question

I have a (Python) dictionary looking like this:

[
    {
        "data": "somedata1",
        "name": "prefix1.7.9"
    },
    {
        "data": "somedata2",
        "name": "prefix1.7.90"
    },
    {
        "data": "somedata3",
        "name": "prefix1.1.1"
    },
    {
        "data": "somedata4",
        "name": "prefix4.1.1"
    },
    {
        "data": "somedata5",
        "name": "prefix4.1.2"
    },
    {
        "data": "somedata5",
        "name": "other 123"
    },
    {
        "data": "somedata6",
        "name": "different"
    },  
    {
        "data": "somedata7",
        "name": "prefix1.7.11"
    },
    {
        "data": "somedata7",
        "name": "prefix1.11.9"
    },
    {
        "data": "somedata7",
        "name": "prefix1.17.9"
    }   
]

Now I want to sort it by "name" key. If there postfix are numbers (splitted by 2 points) I want to sort it numerical. e.g. with a resulting order:

different
other 123
prefix1.1.1
prefix1.1.9
prefix1.7.11
prefix1.7.90
prefix1.11.9
prefix1.17.9
prefix4.1.1
prefix4.1.2

Do you have an idea how to do this short and efficient? The only idear I had, was to build a complete new list, but possibly this could also be done using a lambda function?

https://stackoverflow.com/questions/72899/how-do-i-sort-a-list-of-dictionaries-by-a-value-of-the-dictionary — falm, Oct 12 '22 at 09:51
Does this answer your question? [Is there a built in function for string natural sort?](https://stackoverflow.com/questions/4836710/is-there-a-built-in-function-for-string-natural-sort) — Stef, Oct 12 '22 at 09:58

blhsing · Accepted Answer · 2022-10-12T10:27:39.310

1

You can use re.findall with a regex that extracts either non-numerical words or digits from each name, and convert those that are digits to integers for numeric comparisons. To avoid comparisons between strings and integers, make the key a tuple where the first item is a Boolean of whether the token is numeric and the second item is the actual key for comparison:

import re

# initialize your input list as the lst variable
lst.sort(
    key=lambda d: [
        (s.isdigit(), int(s) if s.isdigit() else s)
        for s in re.findall(r'[^\W\d]+|\d+', d['name'])
    ]
)

Demo: https://replit.com/@blhsing/ToughWholeInformationtechnology

edited Oct 12 '22 at 10:27

answered Oct 12 '22 at 10:10

blhsing

91,368
6
71
106

3

Can you try adding `{ "data": "dangerousdata", "name": "prefix1.hello"},` to the data? I suspect it would crash with `TypeError: '<' not supported between instances of 'str' and 'int'`. – Stef Oct 12 '22 at 10:15
1

Nice and short solution :) – kruemel4 Oct 12 '22 at 11:58

score 0 · Answer 2 · answered Oct 12 '22 at 09:59

You need to come up with a way of extracting your prefix, and your postfix from the 'name' values. This can be achieved using something like:

import math


def extract_prefix(s: str) -> str:
    return s.split('.')[0]


def extract_postfix(s: str) -> float:
    try:
        return float('.'.join(s.split('.')[1:]))
    except ValueError:
        # if we cannot form a float i.e. no postfix exists, it'll be before some value with same prefix
        return -math.inf


arr = [{'data': 'somedata1', 'name': 'prefix1.7.9'},
 {'data': 'somedata2', 'name': 'prefix1.7.90'},
 {'data': 'somedata3', 'name': 'prefix1.1.1'},
 {'data': 'somedata4', 'name': 'prefix4.1.1'},
 {'data': 'somedata5', 'name': 'prefix4.1.2'},
 {'data': 'somedata5', 'name': 'other 123'},
 {'data': 'somedata6', 'name': 'different'},
 {'data': 'somedata7', 'name': 'prefix1.7.11'},
 {'data': 'somedata7', 'name': 'prefix1.11.9'},
 {'data': 'somedata7', 'name': 'prefix1.17.9'}]


result = sorted(sorted(arr, key=lambda d: extract_postfix(d['name'])), key=lambda d: extract_prefix(d['name']))

result:

[{'data': 'somedata6', 'name': 'different'},
 {'data': 'somedata5', 'name': 'other 123'},
 {'data': 'somedata3', 'name': 'prefix1.1.1'},
 {'data': 'somedata7', 'name': 'prefix1.7.11'},
 {'data': 'somedata1', 'name': 'prefix1.7.9'},
 {'data': 'somedata2', 'name': 'prefix1.7.90'},
 {'data': 'somedata7', 'name': 'prefix1.11.9'},
 {'data': 'somedata7', 'name': 'prefix1.17.9'},
 {'data': 'somedata4', 'name': 'prefix4.1.1'},
 {'data': 'somedata5', 'name': 'prefix4.1.2'}]

In your result, `'prefix1.7.11'` comes before `'prefix1.7.9'`. I would expect 9 should come before 11. I would suggest a way to fix that, for instance using `groupby(str.isdigit, )` instead of `str.split('.')`. But I recommend using a library function instead of patching things yourself. See for instance [Special Cases Everywhere](https://natsort.readthedocs.io/en/master/howitworks.html#special-cases-everywhere) in the documentation for library `natsort`. — Stef, Oct 12 '22 at 10:02

score 0 · Answer 3 · answered Oct 12 '22 at 10:09

0

Since you want to sort numerically you will need a helper function:

def split_name(s):
    nameparts = s.split('.')
    for i,p in enumerate(nameparts):
        if p.isdigit():
            nameparts[i] = int(p)
    return nameparts

obj = obj.sort(key = lambda x:split_name(x['name']))

answered Oct 12 '22 at 10:09

gimix

3,431
2
5
21

Splitting on `.` fails to identify the first number, before the first `.`. For instance, try with `prefix9.1.1` and `prefix10.1.1` in the data. I expect `prefix10` should come after `prefix9`. I think one way to fix this is to use `groupby(str.isdigit, s)` instead of `s.split('.')`. Also, I think your function might crash if two strings have different structure, such as `prefix1.1.1` and `prefix1.hello`, because it would try to compare int `1` with string `'hello'`, which is okay in python2 but crashes in python3. – Stef Oct 12 '22 at 10:12

R. Baraiya · Answer 4 · 2022-10-12T12:56:33.927

Here I am first sorting the list by version. Storing in the another list rank call rank, this list helps to replicates the ranking position for custom sorting.

Code using the pkg_resources:

from pkg_resources import parse_version

rank=sorted([v['name'] for v in Mydata], key=parse_version)

or

rank = sorted(sorted([v['name'] for v in Mydata], key=parse_version), key = lambda s: s[:3]=='pre') #To avoid the prefix value in sorting
sorted(Mydata, key = lambda x: rank.index(x['name']))

Output:

[{'data': 'somedata6', 'name': 'different'},
 {'data': 'somedata5', 'name': 'other 123'},
 {'data': 'somedata3', 'name': 'prefix1.1.1'},
 {'data': 'somedata1', 'name': 'prefix1.7.9'},
 {'data': 'somedata7', 'name': 'prefix1.7.11'},
 {'data': 'somedata2', 'name': 'prefix1.7.90'},
 {'data': 'somedata7', 'name': 'prefix1.11.9'},
 {'data': 'somedata7', 'name': 'prefix1.17.9'},
 {'data': 'somedata4', 'name': 'prefix4.1.1'},
 {'data': 'somedata5', 'name': 'prefix4.1.2'}]

With another inputs:

[{'data': 'somedata6', 'name': 'Aop'},
 {'data': 'somedata6', 'name': 'different'},
 {'data': 'somedata5', 'name': 'other 123'},
 {'data': 'somedata7', 'name': 'pop'},
 {'data': 'somedata3', 'name': 'prefix1.hello'},
 {'data': 'somedata3', 'name': 'prefix1.1.1'},
 {'data': 'somedata4', 'name': 'prefix1.2.hello'},
 {'data': 'somedata1', 'name': 'prefix1.7.9'},
 {'data': 'somedata7', 'name': 'prefix1.7.11'},
 {'data': 'somedata2', 'name': 'prefix1.7.90'},
 {'data': 'somedata7', 'name': 'prefix1.17.9'},
 {'data': 'somedata7', 'name': 'prefix1.17.9'},
 {'data': 'somedata5', 'name': 'prefix4.1.2'},
 {'data': 'somedata7', 'name': 'prefix9.1.1'},
 {'data': 'somedata7', 'name': 'prefix10.11.9'}]

Sort dictionary by key numeric with alphanumeric data

4 Answers4