1

In python I am looking for a way to receive a warning or error when loading a statically declared dictionary from a file that contains duplicate keys, for my use case the file is from user input so I want to make sure the dictionaries I receive dont have duplicate keys. I get that after the load dictionary1 is the same as dictionary 2 and that python dictionaries keep the rightmost key/value pair. What I am looking for is a way to get a warning or error before or during the load which indicates that dictionary1 had multiple duplicate "a" keys.

dictionary1 = {"a":1, "a":2, "a":3}
dictionary2 = {"a":3}

The best idea that I can think of is using a list of dictionaries and then adding each dictionary to the final dictionary such as the example below. This works but the list of dictionaries is not as user-friendly as just standard dictionaries.

listofDicts = [{"a":1},{"a":2},{"a":3}]
masterDict = {}
for entry in listofDict:
    for subDict in entry:
        if subDict in masterDict.keys():
            print ("ERROR key \"%s\" already exists with value %d" % (subDict, masterDict[subDict]))
        else:
            masterDict.update({subDict:entry[subDict]})
jprince14
  • 191
  • 2
  • 13
  • What is the format of your file? – Erik Godard May 11 '17 at 04:21
  • 1
    as an aside, `if subDict in masterDict.keys():` is an anti-pattern. You are creating a `list` of all the keys in `masterDict`, an then doing a membership test on that list, which is a O(N) operation, but if you had just done `if subDict in masterDict` you would have had O(1) dictionary keys membership test. – juanpa.arrivillaga May 11 '17 at 04:28
  • Use `json.loads` to parse your file, then refer to the answer posted here: http://stackoverflow.com/questions/14902299/json-loads-allows-duplicate-keys-in-a-dictionary-overwriting-the-first-value – Ken Wei May 11 '17 at 04:30
  • The file format is just a statically declared .py file that contains the definition of the dictionary. A user may mistakenly use the same dictionary key multiple times so I want to catch the mistake when I load it into my program that processes the input file. I tried using the json.loads but for a simple dictionary such as `dictionary1` from the question I receive `TypeError: expected string or buffer` – jprince14 May 11 '17 at 15:08
  • So with json I need the entire dictionary in a string which I would like to avoid, is there a way to encode the dictionary to json while checking for duplicate keys at the same time? – jprince14 May 11 '17 at 17:40

1 Answers1

2

You can use the ast module to parse the Python source code in your files containing the dictionaries and look for dictionary literals with duplicate keys:

import ast
import logging

class DuplicateKeyVisitor(ast.NodeVisitor):
    def visit_Dict(self, node):
        seen_keys = set()

        for key_node in node.keys:
            try:
                key = ast.literal_eval(key_node)
            except ValueError:
                continue

            if key in seen_keys:
                logging.warning('Dictionary literal at (%d, %d) has duplicate keys', node.lineno, node.col_offset)

            seen_keys.add(key)

DuplicateKeyVisitor().visit(ast.parse('''
foo = {'a': 1, 'a': 2}
bar = {'a': 1, 'b': 2}
bar = {'a': 1, 'b': 2, 'a': 3}
'''))
Blender
  • 289,723
  • 53
  • 439
  • 496
  • Doesn't dictionaries remove multiple keys after defining them(Prevailing the last value)? Why is this necessary? – Vinícius Figueiredo May 11 '17 at 04:33
  • 1
    @ViníciusAguiar: They do, but the question asks for a way to know if a Python file you're loading contains dictionary literals that have duplicate keys. – Blender May 11 '17 at 04:35
  • I am looking for a method that doesn't requires the entire dictionary to be within a string due, is there a way to apply a similar technique where the dictionary is declared in like a = {1: "a", 1:"b"} as opposed to a = """{1: "a", 1:"b"}""" – jprince14 May 11 '17 at 17:45
  • @jprince14: How is the dictionary created? Does the user edit your script file? – Blender May 11 '17 at 18:05
  • The user manually creates a .py file containing the statically declared dictionary which gets written to a file using pickle and then I load the dictionary using pickle. I want to come up with something to provide the user with that checks their dictionaries for duplicates before the call to pickle.dump. I provide users with a template file which the fill out with their own dictionaries. – jprince14 May 11 '17 at 18:57
  • @jprince14: Don't pickle the dictionaries. By the time you pickle it, the dictionary object will have been created, so you can't tell if duplicate keys were overwritten or not. Just read the contents of the `.py` file before importing it and do `DuplicateKeyVisitor().visit(ast.parse(file_contents))`. – Blender May 11 '17 at 19:56
  • That worked for me. I am reading the text of the files that contain the dictionaries and checking them with `DuplicateKeyVisitor().visit(ast.parse(file_contents))`. I actually needed to use pickle because I am actually using jython and the dictionaries are too large so i needed to use subprocess.call to call the files with the dictionary to pickle dump them to a file and then in my code load them in. Thanks for the help! – jprince14 May 11 '17 at 20:17
  • @jprince14: That doesn't sound right. How are they too large? – Blender May 12 '17 at 00:16
  • It was a jython memory issue. When I was trying to import the statically declared dictionary from another file it failed with a memory related error, when I looked it up it was a known error with jython. – jprince14 May 12 '17 at 03:14
  • @Blender Here is more info on the jython error if you are interested, [post1](https://bugs.launchpad.net/sikuli/+bug/1216780) and [post2](https://www.thecodingforums.com/threads/jython-problem-with-an-huge-dictionary.723353/). – jprince14 May 12 '17 at 13:34
  • @jprince14: wow, I never knew that. Can you disable the generation of class files to avoid this bug? – Blender May 12 '17 at 20:34