1

JSON allows duplicate keys in objects. The Python JSON parser by default ignores any but the last duplicate, but has a mechanism for overriding this behaviour which I need to use to detect and report such issues. Basically, the code will be something like load(file_pointer, object_pairs_hook=report_duplicates).

To test this behaviour I need to create a JSON object with duplicate keys. Is there some simple way to do that in Python 3.8 without hand-coding the JSON string? For example, could I convert [("x": 1), ("x": 2)] into a JSON string {"x": 1, "x": 2}? (The order doesn't matter.) The reason is that I need the JSON object to have a specific structure, so the test will be noisy if I have to type the whole thing into a string.

Since this is specifically about duplicate keys I can't use solutions like dumps(dict(…)).


So far I have this:

class KeyValueList(list):
    pass


class KeyValueListEncoder(JSONEncoder):
    def iterencode(self, obj: Any, _one_shot: bool = False) -> Iterator[str]:
        if isinstance(obj, KeyValueList):
            yield "{"

            max_index = len(obj) - 1
            for index, (key, value) in enumerate(obj):
                yield JSONEncoder.encode(self, key)
                yield self.key_separator
                yield JSONEncoder.encode(self, value)
                if index != max_index:
                    yield self.item_separator

            yield "}"
        else:
            for chunk in JSONEncoder.iterencode(self, obj, _one_shot=_one_shot):
                yield chunk

This works if the top-level object being serialized is a KeyValueList, but if the KeyValueList is nested in another data structure it's serialized as a list:

>>> dumps(KeyValueList((("x", 1), ("x", 2))), cls=KeyValueListEncoder)
'{"x": 1, "x": 2}'
>>> dumps({"foo": KeyValueList((("x", 1), ("x", 2)))}, cls=KeyValueListEncoder)
'{"foo": [["x", 1], ["x", 2]]}'

(I can't override default as per the documentation because default is only called when a type is not already serializable.)

l0b0
  • 55,365
  • 30
  • 138
  • 223
  • 1
    You probably will have to create a custom `list` type (to not conflate with standard handling of `list`) to store the list of 2-tuples, and follow the solution presented in [How to make a class JSON serializable](https://stackoverflow.com/questions/3768895/how-to-make-a-class-json-serializable). – metatoaster Apr 07 '21 at 03:31
  • 1
    Clearly you can create a dict that is close to your needs with, say, "x1" and "x2", then bring it in to a text editor and patch it. That has to be the easiest path. – Tim Roberts Apr 07 '21 at 03:39
  • @TimRoberts That doesn't make for nice code though, and doesn't scale. – l0b0 Apr 07 '21 at 23:06
  • I disagree, the code would be MUCH nicer. You could, I suppose, use a magic marker for your duplicates, like `x__1__`, `x__2__`, and so on. Then, after the `dumps` you could use simple string substitution to remove the magic markers. – Tim Roberts Apr 07 '21 at 23:11

0 Answers0