19

I try to raise an error if the user enter a duplicate key in a dictionary. The dictionary is in a file and the user can edit the file manually.

Example:

dico= {'root':{
                'a':{'some_key':'value',...},
                'b':{'some_key':'value',...},
                'c':{'some_key':'value',...},
                ...

                'a':{'some_key':'value',...},
              }
      }

the new key 'a' already exist...

How can I test dico and warn the user when I load dico from the file?

Thammas
  • 973
  • 2
  • 9
  • 14
  • How are you loading the dictionary from the file? – Hugh Bothwell Feb 15 '11 at 01:59
  • 1
    @HughBothwell : with `from x import dico` – Thammas Feb 15 '11 at 02:35
  • As @JohnMachin pointed out it is a big mistake to directly execute code edited by users. You are allowing the users to do anything with your application! In your case you should use some established data format instead - something like JSON or YAML. There are libraries for that. If you need to use the Python syntax, you have to parse the file explicitly. It is a big mistake to execute it. --- Because the question is so old, this warning is mainly intended for new readers. – pabouk - Ukraine stay strong Aug 05 '20 at 07:04

5 Answers5

20

Write a subclass of dict, override __setitem__ such that it throws an error when replacing an existing key; rewrite the file to use your new subclass's constructor instead of the default dict built-ins.

import collections

class Dict(dict):
    def __init__(self, inp=None):
        if isinstance(inp,dict):
            super(Dict,self).__init__(inp)
        else:
            super(Dict,self).__init__()
            if isinstance(inp, (collections.Mapping, collections.Iterable)): 
                si = self.__setitem__
                for k,v in inp:
                    si(k,v)

    def __setitem__(self, k, v):
        try:
            self.__getitem__(k)
            raise ValueError("duplicate key '{0}' found".format(k))
        except KeyError:
            super(Dict,self).__setitem__(k,v)

then your file will have to be written as

dico = Dict(
    ('root', Dict(
        ('a', Dict(
            ('some_key', 'value'),
            ('another_key', 'another_value')
        ),
        ('b', Dict(
            ('some_key', 'value')
        ),
        ('c', Dict(
            ('some_key', 'value'),
            ('another_key', 'another_value')
        ),

        ....
    )
)

using tuples instead of dicts for the file import (written using the {} notation, it would use the default dict constructor, and the duplicates would disappear before the Dict constructor ever gets them!).

Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99
  • This is the best solution! It raises the expected exception not only when you try to add items one by one, but also when you convert a list of tuples with repeated first items into a dictionary: Dict([(1, 2), (3, 4), (1, 6)]). – jciloa Jun 29 '16 at 17:50
  • Minor suggestion: If using – RDK May 03 '19 at 06:29
  • Hello, shouldn't the for loop in the part aimed at ensuring initial consistency of the dictionary be something like `for k,v in inp.items():` ? (Not sure how that pans out for mappings though, but I'll leave that to others.) – brezniczky Jul 18 '19 at 11:43
  • I don't know if I'm missing something, but why the `try`-`except` block in the definition of `__setitem__`? Writing an `if`-`else` block looks more natural to me: `if k in self: raise ...; else: super().__setitem__(k, v)`. The `else` clause may be omitted as well, but that is a matter of taste. – ruancomelli Sep 07 '20 at 13:54
6

If you want to ensure that an error is raised during dict construction with duplicate keys, just leverage Python's native keyword argument checking:

> dict(a={}, a={})
SyntaxError: keyword argument repeated

Unless I'm missing something, there is no need to subclass dict.

Jian
  • 10,320
  • 7
  • 38
  • 43
  • 2
    Why isn't this the accepted answer? This is a super-pythonic way to handle the issue. – Pavel Brodsky Jan 22 '20 at 15:14
  • 3
    It is important to note a few things regarding this answer: (1) it forbids duplicate keys upon *construction* but allows overwriting keys afterwards (that is, `d['a'] = 'new_value'` is OK); (2) and it requires all keys to be strings, whereas subclassing allows other types of keys, such as `tuple`s, to be used. Nonetheless, this is also my way-to-go, and it seems to fit best the OP's requirements. – ruancomelli Sep 07 '20 at 13:49
  • 2
    This also does not work if keys are possibly not strings: `dict(1="foo")` and `dict(None="foo")` raise an exception, but both `{None: "foo"}` and `{1: "foo"}` work. – Martim May 12 '21 at 14:49
4

You will need to have custom dict which can reject with ValueError if the key is already present.

class RejectingDict(dict):
    def __setitem__(self, k, v):
        if k in self.keys():
            raise ValueError("Key is already present")
        else:
            return super(RejectingDict, self).__setitem__(k, v)

Here is how it works.

>>> obj = RejectingDict()
>>> obj[1] = True
>>> obj[2] = False
>>> obj
{1: True, 2: False}
>>> obj[1] = False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "rejectingdict.py", line 4, in __setitem__
    raise ValueError("Key is already present")
ValueError: Key is already present
Senthil Kumaran
  • 54,681
  • 14
  • 94
  • 131
  • 2
    note that k in self.keys() is O(n), you should probably use `in self` directly (didn't check) – iggy Jul 02 '15 at 15:36
  • This does not raise the expected exception when you convert a list of tuples with repeated first items into a dictionary: RejectingDict([(1, 2), (3, 4), (1, 6)]). The accepted solution (by Hugh Bothwell) works for that case as well. – jciloa Jun 29 '16 at 17:49
  • oh, using self.keys() is too slow – spiritwolfform Sep 01 '16 at 13:11
3

WRONG WAY
GO BACK

from x import dico is not a very good idea -- you are letting USERS edit code, which you then execute blindly. You run the risk of simple typos causing a syntax error, up to malicious stuff like import os; os.system("rm whatever"); dico = {}.

Don't faff about with subclassing dict. Write your own dict-of-dicts loader. It's not that hard ... read the data file, check before each insertion whether the key already exists; if it does, log an error message with meaningful stuff like the line number and the duplicate key and its value. At the end, if there have been any errors, raise an exception. You may find that there's an existing module to do all that ... the Python supplied ConfigParser aka configparser doesn't seem to be what you want.

By the way, isn't having a single 'root' key at the top level rather pointless?

John Machin
  • 81,303
  • 11
  • 141
  • 189
  • Thanks for your comment. `dico` is actually a settings file. As I'm a python beginner and as I don't understand all the codes from answers, I think I will use the ConfigParser instead... – Thammas Feb 15 '11 at 03:19
  • @Thammas: Huh? (1) """dico is actually a settings file""": According to your question, `dico` is the name of a 3-level source code dictionary [you hope!!] in a file called `x.py` (2) What gives you the impression that ConfigParser supports duplicate detection with meaningful error messages? – John Machin Feb 15 '11 at 03:56
  • You are right, dico is dictionary in file.py... I had misspoken... You're right too about ConfigParser ! I will try to study the codes provided in answers. – Thammas Feb 15 '11 at 04:41
  • 4
    I've down voted this answer, because author asked about solution for given problem not about good practice. If you start answer by saying "do not do that" it's not answer it's lecturing. Has it's own goal and is not bad but stack overflow is all about answers, especially when you have limited or even no knowledge what's Thammas project. – Drachenfels Dec 26 '13 at 17:40
1

Python's default behavior is to silently overwrite duplicates when declaring a dictionary.

You could create your own dictionary class that would check whether an item was already in a dictionary before adding new elements and then use this. But then you would have to change your declaration of dico in that file to something that allows duplicates, like a list of tuples for example.

Then on loading that data file, you'd parse it into your special 'subclassed' dict.

Jesse Cohen
  • 4,010
  • 22
  • 25