0

I have a JSON string with duplicate entries. I already learned how to convert those into an array to preserve all of them when calling json.loads(string), see e.g. https://stackoverflow.com/a/61416136/7471760.

The question now, in some cases, I have very weird input JSON strings (which I cannot change) where I really need to preserve the order also among 'different' sets of duplicates. For example:

jstring = '\
{\
   "anna": { "age": 23, "color": "green"},\
   "john": { "age": 35, "color": "blue"},\
   "laura":{ "age": 32, "color": "red"},\
   "john": { "age": 31, "color": "black"},\
   "anna": { "age": 41, "color": "pink"}\
}'

Now I use this hook to convert this string into a JSON object, but without loosing the duplicates (different students).

def array_on_duplicates(ordered_pairs):
    d = {}
    for k, v in ordered_pairs:
        if k in d:
            if type(d[k]) is list:
                d[k].append(v)
            else:
                d[k] = [d[k],v]
        else:
           d[k] = v
    return d

However, I still need to recover the original list order of these students (first input - first ouput).

When using json.loads(), I have all entries, but I lose that original order:

json.loads(jstring, object_pairs_hook=array_on_duplicates)
{'anna': [{'age': 23, 'color': 'green'}, {'age': 41, 'color': 'pink'}],
 'john': [{'age': 35, 'color': 'blue'}, {'age': 31, 'color': 'black'}],
 'laura': {'age': 32, 'color': 'red'}}

What would be the most efficient way around this problem? (apart from changing the cumbersome input string, which I can unfortunately not).

ferdymercury
  • 698
  • 4
  • 15

1 Answers1

0

One workaround I thought of was to generate duplicated keys for each entry, anna_1, anna_2, etc as suggested here: https://stackoverflow.com/a/29323197/7471760, so that one can have unique entries, and then hook the pair to an OrderedDict.

Other option would be to return in the hook the key-value tuples directly and process it later https://stackoverflow.com/a/29322077/7471760.

However, it was quite useful for me to keep the array structure, and what suited me most was to use this workaround that keeps the order explicitly in an extra key:

def array_on_duplicates_keep_order(ordered_pairs):
    """Convert duplicate keys to arrays and store order on an extra key."""
#    https://www.semicolonworld.com/question/56998/python-json-parser-allow-duplicate-keys
#    https://stackoverflow.com/questions/14902299/json-loads-allows-duplicate-keys-in-a-dictionary-overwriting-the-first-value

    d = {}
    order = 0
    for k, v in ordered_pairs:
        if type(v) is dict:
            v['o'] = order
        if k in d:
            if type(d[k]) is list:
                d[k].append(v)
            else:
                d[k] = [d[k],v]
        else:
           d[k] = v
        order += 1
    return d

which produces:

jobj = json.loads(jstring, object_pairs_hook=array_on_duplicates_keep_order)
{'anna': [{'age': 23, 'color': 'green', 'o': 0},
  {'age': 41, 'color': 'pink', 'o': 4}],
 'john': [{'age': 35, 'color': 'blue', 'o': 1},
  {'age': 31, 'color': 'black', 'o': 3}],
 'laura': {'age': 32, 'color': 'red', 'o': 2}}

Finally, I can recover the original order of students by using a named tuple and sorting by the order key:

class Student(NamedTuple):
    name: str
    age: int
    color: str
    o: int

studentList = []
for k, v in jobj.items(): 
    if not type(v) is list:
       studentList.append(Student(k, v['age'], v['color'], v['o']))
    else:
       for s in v:
           studentList.append(Student(k, s['age'], s['color'], s['o']))

orderedList = sorted(studentList, key=lambda s: s.o) 

Which gives me what I wanted, without changing the input and still using JSON as intermediate storage variable:

studentList
[Student(name='anna', age=23, color='green', o=0),
 Student(name='anna', age=41, color='pink', o=4),
 Student(name='john', age=35, color='blue', o=1),
 Student(name='john', age=31, color='black', o=3),
 Student(name='laura', age=32, color='red', o=2)]

orderedList
[Student(name='anna', age=23, color='green', o=0),
 Student(name='john', age=35, color='blue', o=1),
 Student(name='laura', age=32, color='red', o=2),
 Student(name='john', age=31, color='black', o=3),
 Student(name='anna', age=41, color='pink', o=4)]
ferdymercury
  • 698
  • 4
  • 15