1

I am trying to deep sort a list of json of list (and so on..which can go recursively nested) where json can have duplicate keys, and without a specific key in the json to sort on.

Input:

{"payload": [
  {
    "a": {
      "aa": [
        {
          "aa12": {
            "aaa23": 230,
            "aaa21": 210,
            "aaa.something": "yes"
          }
        },
        {
          "aa11": {
            "aaa12": 120,
            "aaa11": 110
          }
        },
        {
          "aa13": {
            "aaa35": 350,
            "aaa32": 320,
            "aaa.someattr": "true"
          }
        }
      ],
      "aa": [
        {
          "aa12": {
            "aaa22": 22,
            "aaa21": 21
          }
        },
        {
          "aa10": {
            "aaa03": 3,
            "aaa01": 1
          }
        },
        {
          "aa13": {
            "aaa33": 33,
            "aaa32": 32
          }
        },
        {
          "aa1": "aab"
        }
      ],
      "ac": [
        "ac3",
        "ac1",
        "ac2"
      ]
    }
  },
  {
    "b": {
      "bb": [
        "bb4",
        "bb2",
        "bb3",
        "bb1"
      ]
    }
  }
]}

Expected Output:

{"payload": [
  {
    "a": {
      "aa": [
        {
          "aa1": "aab"
        },
        {
          "aa10": {
            "aaa01": 1,
            "aaa03": 3
          }
        },
        {
          "aa12": {
            "aaa21": 21,
            "aaa22": 22
          }
        },
        {
          "aa13": {
            "aaa32": 32,
            "aaa33": 33
          }
        }
      ],
      "aa": [
        {
          "aa11": {
            "aaa11": 110,
            "aaa12": 120
          }
        },
        {
          "aa12": {
            "aaa.something": "yes"
            "aaa21": 210,
            "aaa23": 230
          }
        },
        {
          "aa13": {
            "aaa.someattr": "true",
            "aaa32": 320,
            "aaa35": 350
          }
        }
      ],
      "ac": [
        "ac1",
        "ac2",
        "ac3"
      ]
    }
  },
  {
    "b": {
      "bb": [
        "bb1",
        "bb2",
        "bb3",
        "bb4"
      ]
    }
  }
]}

I have tried using the below recursive method:

ls = {'payload': [{'a': {'aa': [{'aa12': {'aaa23': 230, 'aaa21': 210}}, {'aa11': {'aaa12': 120, 'aaa11': 110}}, {'aa13': {'aaa35': 350, 'aaa32': 320}}], 'ac': ['ac3', 'ac1', 'ac2'], 'aa': [{'aa12': {'aaa22': 22, 'aaa21': 21}}, {'aa10': {'aaa03': 3, 'aaa01': 1}}, {'aa13': {'aaa33': 33, 'aaa32': 32}}, {'aa1': 'aab'}]}}, {'b': {'bb': ['bb4', 'bb2', 'bb3', 'bb1']}}]}
output = sorted_deep(ls)
print(output)

def sorted_deep(d):
  if isinstance(d,list):
    return sorted(sorted_deep(v) for v in d)
  if isinstance(d,dict):
    return {k: sorted_deep(d[k]) for k in sorted(d)}
  return d

But this ain't working. It overrides the duplicate key's value with the last found value when it sorts. Since the duplicate key can be any string, we can't iterate by specifying the key name. I'm looking for a generic solution which sorts any given complex list of json's - with nested list's/json's.

My end goal is to deep match 2 such JSON's to find the differences.

Sameer Mirji
  • 2,135
  • 16
  • 28
  • `JSON` refers to a string. Is this a string? Whatever it is, it is not anything valid: It starts with `{` and ends with `]`. – Booboo Jan 18 '20 at 11:31
  • @Booboo: I've corrected the JSON formatting now. – Sameer Mirji Jan 18 '20 at 11:41
  • This does not answer your question but your end goal. Diff finding algorithms are slow. And even slower when you run it on tens of thousands of line in production environment(incase you are planning). Instead of reinventing the code, I would recommend to use this - https://github.com/google/diff-match-patch – Mohit Rustagi Jan 18 '20 at 17:31
  • @MohitRustagi: Thanks for the diff-match-patch library. However it doesn't sort before comparing complex JSONs. This will yield incorrect results when there are list of dicts of lists (further nested). – Sameer Mirji Jan 19 '20 at 05:11
  • @SameerMirji, fyi the last code block throws a syntax error after your last edit due to parenthesis mismatch. – hilberts_drinking_problem Jan 19 '20 at 05:49
  • Also, duplicate key values in JSON are discussed [here](https://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object) and [here](https://stackoverflow.com/questions/14902299/json-loads-allows-duplicate-keys-in-a-dictionary-overwriting-the-first-value). – hilberts_drinking_problem Jan 19 '20 at 06:04
  • @hilberts_drinking_problem: I have fixed it now. The links explain how we can have duplicate keys in JSON. However, I would like to know how to deep sort them. – Sameer Mirji Jan 19 '20 at 12:43

1 Answers1

0

Consider using sorted()

sorted(paymentsByAgreeement[agreement['agreementId']],key=lambda i: 
(i['eventDate'],i['id']))

read about sorted and lambda here https://wiki.python.org/moin/HowTo/Sorting with lambda, you can access child elements to go deeper you may need a for and another lambda with sorted

Giorgi Beria
  • 86
  • 1
  • 11