1

I am trying to output a dictionary as JSON, using python 2.7 (this can not be upgraded) The keys in the data are strings that contain numbers, like 'item_10', and have an arbitrary order. For example, this code generates some test data:

import random

data = {}
numbers = list(range(1, 12))
random.shuffle(numbers)
for value in numbers:
    data['item_{}'.format(value)] = 'data{}'.format(value)

I tried using:

print(json.dumps(data, sort_keys=True, indent=2))

However, I want the keys to be sorted naturally, like:

{
  "item_1": "data1",  
  "item_2": "data2",
  ...
  "item_10": "data10",
  "item_11": "data11"
}

Instead, I get keys sorted by Python's default sort order:

{
  "item_1": "data1",
  "item_10": "data10",
  "item_11": "data11",
  ...
  "item_2": "data2"
}

How can I get this result?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
newdeveloper
  • 534
  • 3
  • 17
  • 1
    What you are looking for is the natural sorted order. https://stackoverflow.com/questions/4836710/is-there-a-built-in-function-for-string-natural-sort I would use the library natsort – Tom McLean Feb 23 '23 at 13:12
  • @TomMcLean I was afraid of that answer. Unfortunately I can't bring in modules. I will try doing one of the examples on the link thank you. – newdeveloper Feb 23 '23 at 13:16
  • Unfortunately, `sort_keys` doesn't allow for specifying a key for the sorting algorithm (ironically?). However, since dicts in 3.7+ preserve order, they can at least theoretically be sorted. It's also possible to hook into the JSON module in a variety of ways to customize the output. I closed this as a duplicate, but I think it's no longer a duplicate if it's rephrased to be about *how to apply* a custom sort order to the `json.dumps` output. – Karl Knechtel Feb 23 '23 at 13:19
  • @KarlKnechtel That would be nice but the code is version 2.7 and as of now we I am stuck. – newdeveloper Feb 23 '23 at 13:22
  • Just to make sure, you understand that 2.7 is more than 3 years past its EOL, and is comparably as outdated as Windows 7? – Karl Knechtel Feb 23 '23 at 13:23
  • Yes that is correct. I have no control over that. – newdeveloper Feb 23 '23 at 13:25
  • @newdeveloper An answer in that link uses an example with a regex string, if that works – Tom McLean Feb 23 '23 at 14:00
  • I edited the question to give what should be a version-agnostic MRE and description. – Karl Knechtel Feb 23 '23 at 14:01
  • Since you have the constraints of working in legacy Python 2.7 and cannot import additional libraries, you should add those constraints either to the question title or at the start of the question statement. I feel for you. I had to work with Python 2.0 for some scripting in a non-updatable IDE before. Not fun. – RufusVS Feb 23 '23 at 14:12
  • @RufusVS the question can be answered in a fairly language-agnostic way, and ways that are more elegant in more recent versions could be of interest to other people. I see no reason to narrow the scope prematurely. – Karl Knechtel Feb 23 '23 at 14:20
  • @KarlKnechtel - You are absolutely right: Simply because the OP has some constraints, other folks who have the same problem (without the constraints) may find an answer that suits them. – RufusVS Feb 23 '23 at 14:29

2 Answers2

1

By making the keys "naturally comparable"

Supposing that we have a key function that implements the natural-sort comparison, as in Claudiu's answer for the related question:

import re

def natural_sort_key(s, _nsre=re.compile('([0-9]+)')):
    return [int(text) if text.isdigit() else text.lower()
            for text in _nsre.split(s)]

Then we can create a wrapper class for strings which is compared using that function, transform the keys of the dict, and proceed as before:

from functools import total_ordering

@total_ordering
class NaturalSortingStr(str):
    def __lt__(self, other):
        return natural_sort_key(self) < natural_sort_key(other)

fixed = {NaturalSortingStr(k):v for k, v in data.items()}

print(json.dumps(fixed,sort_keys=True,indent=2))

Note that functools.total_ordering is introduced in Python 3.2. In older versions, we should instead define __gt__, __le__ and __ge__ explicitly, in corresponding ways. (Python's sort algorithm should not use these, but it is a good idea to include consistent definitions for correctness.) Of course, the base str's implementations of __eq__ and __ne__ do not need to be replaced.

(In 2.7 we could also instead implement a corresponding __cmp__, but this will break in 3.x.)

By putting the keys in order first

In 3.7 and up, dictionary keys are guaranteed to preserve their order; in earlier versions, we can use collections.OrderedDict to get that property. Note that this does not sort keys, but maintains the order of insertion.

Thus, we can determine the necessary order for keys, and create a new dict by inserting keys in that order:

import sys
if sys.version_info < (3, 7):
    # In newer versions this would also work, but is unnecessary
    from collections import OrderedDict as odict
else:
    odict = dict

sorted_keys = sorted(data.keys(), key=natural_sort_key)
sorted_data = odict((k, data[k]) for k in sorted_keys)
print(json.dumps(sorted_data, indent=2))

Since the data was sorted ahead of time, sort_keys=True is no longer necessary. In modern versions, since we are using the built-in dict, we could also write a dict comprehension (rather than passing a generator to the constructor).

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • This works in python3.x but 2.7 I am getting `TypeError: __init__() should return None, not "bool"`. I am getting to both the class and the function but throws that error. thanks for your help. – newdeveloper Feb 23 '23 at 15:04
  • Sorry, I couldn't understand the comment - I don't know what you are referring to as "this", and I can't make any sense out of "I am getting to both the class and the function but throws that error". Anyway, please keep in mind that not very many people will be set up to test 2.7 code any more. There are no `__init__` definitons in the code here, so there isn't an *obvious* cause for the problem. Please double-check and make sure that you tried a method named `__lt__`, not `__init__` - it is a completely different and unrelated thing. – Karl Knechtel Feb 23 '23 at 15:10
  • "This" was referring to the code you posted. The code runs in python3.7. When I said I am getting to both the class and the function. I was referring to the class `NaturalSortingStr` and function `natural_sort_key`. By getting to them when the code was called I put print statements in them and the statement printed. Yes I tried the correct method. I copied the code from here as is and made no changes. Worked in 3.7 but gave the type error in 2.7. Thank you for all of the help. – newdeveloper Feb 23 '23 at 15:20
-1

Using simplejson instead of the standard library

The third-party simplejson library is the original basis of Python's standard library JSON support; however, it is actively maintained by the original developer, and the standard library uses very old versions, relatively speaking. For example, in Python 3.8, the standard library appears to be based on simplejson 2.0.9; as of posting, the latest version of simplejson is 3.18.3.

Using an up-to-date version of simplejson, we can simply specify the sort key as item_sort_key:

import simplejson as json
import re

# Again using Claudiu's implementation
def natural_sort_key(s, _nsre=re.compile('([0-9]+)')):
    return [int(text) if text.isdigit() else text.lower()
            for text in _nsre.split(s)]

print(json.dumps(data, item_sort_key=natural_sort_key, indent=2))
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153