106

I have a text file which contains duplicate car registration numbers with different values, like so:

EDF768, Bill Meyer, 2456, Vet_Parking
TY5678, Jane Miller, 8987, AgHort_Parking
GEF123, Jill Black, 3456, Creche_Parking
ABC234, Fred Greenside, 2345, AgHort_Parking
GH7682, Clara Hill, 7689, AgHort_Parking
JU9807, Jacky Blair, 7867, Vet_Parking
KLOI98, Martha Miller, 4563, Vet_Parking
ADF645, Cloe Freckle, 6789, Vet_Parking
DF7800, Jacko Frizzle, 4532, Creche_Parking
WER546, Olga Grey, 9898, Creche_Parking
HUY768, Wilbur Matty, 8912, Creche_Parking
EDF768, Jenny Meyer, 9987, Vet_Parking
TY5678, Jo King, 8987, AgHort_Parking
JU9807, Mike Green, 3212, Vet_Parking

I want to create a dictionary from this data, which uses the registration numbers (first column) as keys and the data from the rest of the line for values.

I wrote this code:

data_dict = {}
data_list = []

def createDictionaryModified(filename):
    path = "C:\Users\user\Desktop"
    basename = "ParkingData_Part3.txt"
    filename = path + "//" + basename
    file = open(filename)
    contents = file.read()
    print(contents,"\n")
    data_list = [lines.split(",") for lines in contents.split("\n")]
    for line in data_list:
        regNumber = line[0]
        name = line[1]
        phoneExtn = line[2]
        carpark = line[3].strip()
        details = (name,phoneExtn,carpark)
        data_dict[regNumber] = details
    print(data_dict,"\n")
    print(data_dict.items(),"\n")
    print(data_dict.values())

The problem is that the data file contains duplicate values for the registration numbers. When I try to store them in the same dictionary with data_dict[regNumber] = details, the old value is overwritten.

How do I make a dictionary with duplicate keys?


Sometimes people want to "combine" or "merge" multiple existing dictionaries by just putting all the items into a single dict, and are surprised or annoyed that duplicate keys are overwritten. See the related question How to merge dicts, collecting values from matching keys? for dealing with this problem.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
nrj
  • 1,195
  • 3
  • 9
  • 6
  • 19
    If a dictionary allowed duplicate keys with different associated values, which one would you expect to to be retrieved when you look up the value for such a key later? – martineau May 19 '12 at 12:17

9 Answers9

157

Python dictionaries don't support duplicate keys. One way around is to store lists or sets inside the dictionary.

One easy way to achieve this is by using defaultdict:

from collections import defaultdict

data_dict = defaultdict(list)

All you have to do is replace

data_dict[regNumber] = details

with

data_dict[regNumber].append(details)

and you'll get a dictionary of lists.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 5
    I didn't, at first, understand that this is equivalent to declaring the dictionary key's value as a list yourself and appending to it. Eliminates a few lines of boilerplate though, which is nice. `if not my_key in data_dict:` `data_dict[my_key] = list()` – ThorSummoner Apr 30 '15 at 23:03
  • @ThorSummoner it can also be done using the `setdefault` method. – Karl Knechtel Feb 14 '23 at 17:57
52

You can change the behavior of the built in types in Python. For your case it's really easy to create a dict subclass that will store duplicated values in lists under the same key automatically:

class Dictlist(dict):
    def __setitem__(self, key, value):
        try:
            self[key]
        except KeyError:
            super(Dictlist, self).__setitem__(key, [])
        self[key].append(value)

Output example:

>>> d = dictlist.Dictlist()
>>> d['test'] = 1
>>> d['test'] = 2
>>> d['test'] = 3
>>> d
{'test': [1, 2, 3]}
>>> d['other'] = 100
>>> d
{'test': [1, 2, 3], 'other': [100]}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Scorpil
  • 1,422
  • 1
  • 11
  • 14
16

Rather than using a defaultdict or messing around with membership tests or manual exception handling, use the setdefault method to add new empty lists to the dictionary when they're needed:

results = {}                              # use a normal dictionary for our output
for k, v in some_data:                    # the keys may be duplicates
    results.setdefault(k, []).append(v)   # magic happens here!

setdefault checks to see if the first argument (the key) is already in the dictionary. If doesn't find anything, it assigns the second argument (the default value, an empty list in this case) as a new value for the key. If the key does exist, nothing special is done (the default goes unused). In either case though, the value (whether old or new) gets returned, so we can unconditionally call append on it (knowing it should always be a list).

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Blckknght
  • 100,903
  • 11
  • 120
  • 169
  • I'd like to point out why you should eschew `.setdefault(k, []).append(v)`. For each key-value pair, a new list is created. This newly created list is stored in the dictionary if the key is absent, otherwise it is discarded. This results in a lot of temporary list creation and abandonment. `defaultdict(list)` only calls the factory method if the key does not exist, so unneeded lists are never created. – AJNeufeld Oct 22 '21 at 17:09
  • That's a very good point. The default value used with `setdefault` does indeed need to be fully instantiated up front, before the call gets made to see if it's actually needed. For an empty list the performance cost is small, but not entirely trivial. For a more heavyweight object (like, say, a large `numpy` array) it might be prohibitive. So use this solution when it simplifies your code (that's already using plain dictionaries) if performance isn't critical, but pick one of the alternatives in any case where creating extra objects is problematic. – Blckknght Oct 22 '21 at 22:53
9

You can't have a dict with duplicate keys for definition! Instead you can use a single key and, as the value, a list of elements that had that key.

So you can follow these steps:

  1. See if the current element's key (of your initial set) is in the final dict. If it is, go to step 3
  2. Update dict with key
  3. Append the new value to the dict[key] list
  4. Repeat [1-3]
Dave Mackey
  • 4,306
  • 21
  • 78
  • 136
DonCallisto
  • 29,419
  • 9
  • 72
  • 100
7

If you want to have lists only when they are necessary, and values in any other cases, then you can do this:

class DictList(dict):
    def __setitem__(self, key, value):
        try:
            # Assumes there is a list on the key
            self[key].append(value)
        except KeyError: # If it fails, because there is no key
            super(DictList, self).__setitem__(key, value)
        except AttributeError: # If it fails because it is not a list
            super(DictList, self).__setitem__(key, [self[key], value])

You can then do the following:

dl = DictList()
dl['a']  = 1
dl['b']  = 2
dl['b'] = 3

Which will store the following {'a': 1, 'b': [2, 3]}.


I tend to use this implementation when I want to have reverse/inverse dictionaries, in which case I simply do:

my_dict = {1: 'a', 2: 'b', 3: 'b'}
rev = DictList()
for k, v in my_dict.items():
    rev_med[v] = k

Which will generate the same output as above: {'a': 1, 'b': [2, 3]}.


CAVEAT: This implementation relies on the non-existence of the append method (in the values you are storing). This might produce unexpected results if the values you are storing are lists. For example,

dl = DictList()
dl['a']  = 1
dl['b']  = [2]
dl['b'] = 3

would produce the same result as before {'a': 1, 'b': [2, 3]}, but one might expected the following: {'a': 1, 'b': [[2], 3]}.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
toto_tico
  • 17,977
  • 9
  • 97
  • 116
  • "If you want to have lists only when they are necessary, and values in any other cases" - **this is a bad thing to want**. The code that **uses** the dict will then have to include corresponding logic to check whether the value is a list of a plain element (and it's that much harder if the individual values are supposed to be lists themselves!). As the Zen of Python tells us, "special cases aren't special enough to break the rules." – Karl Knechtel Feb 14 '23 at 18:01
6

You can refer to the following article: http://www.wellho.net/mouth/3934_Multiple-identical-keys-in-a-Python-dict-yes-you-can-.html

In a dict, if a key is an object, there are no duplicate problems.

For example:

class p(object):
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return self.name
    def __str__(self):
        return self.name
d = {p('k'): 1, p('k'): 2}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
xiansweety
  • 113
  • 1
  • 4
  • 6
    How to get all values with the key 'k'? The only way to do this is sequential comparing, which loses the meaning of using a hash dictionary. – minion Nov 02 '18 at 09:36
  • Clean and elegant answer. Thanks! The corollary is that, if you're already planning to use objects then you don't need to do anything. See full example using OrderedDict: https://stackoverflow.com/a/56959984/1732392 – Feiteira Jul 09 '19 at 20:20
  • 2
    @minion is right here. Either you store references to those objects to access the values, or you have to iterate. In the former case you might as well just put the data in the key object and forget the dictionary, in the latter case you can just use a list of tuples. This doesn't really do what it says on the tin - you are just making the keys different. That might solve your problem, but at that point, the dictionary becomes the wrong data structure and you just have a layer of indirection you don't need. – Gareth Latty Jul 09 '19 at 20:34
  • Okay this solution works perfectly for cases where you just need a dict and you don't want to retrieve the values via the key or something, in other words you don't need to do anything other than getting the data structure. – 0xInfection Aug 25 '20 at 05:28
2

You can't have duplicated keys in a dictionary. Use a dict of lists:

for line in data_list:
  regNumber = line[0]
  name = line[1]
  phoneExtn = line[2]
  carpark = line[3].strip()
  details = (name,phoneExtn,carpark)
  if not data_dict.has_key(regNumber):
    data_dict[regNumber] = [details]
  else:
    data_dict[regNumber].append(details)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Oskarbi
  • 297
  • 1
  • 8
1

It's pertty old question but maybe my solution help someone.

by overriding __hash__ magic method, you can save same objects in dict.

Example:

from random import choices

class DictStr(str):
    """
        This class behave exacly like str class but
        can be duplicated in dict
    """
    def __new__(cls, value='', custom_id='', id_length=64):
        # If you want know why I use __new__ instead of __init__
        # SEE: https://stackoverflow.com/a/2673863/9917276
        obj = str.__new__(cls, value)
        if custom_id:
            obj.id = custom_id
        else:
            # Make a string with length of 64
            choice_str = "abcdefghijklmopqrstuvwxyzABCDEFJHIJKLMNOPQRSTUVWXYZ1234567890"
            obj.id = ''.join(choices(choice_str, k=id_length))
        return obj

    def __hash__(self) -> int:
        return self.id.__hash__()

Now lets create a dict:

>>> a_1 = DictStr('a')
>>> a_2 = DictStr('a')
>>> a_3 = 'a'
>>> a_1
a
>>> a_2
a
>>> a_1 == a_2 == a_3
True
>>> d = dict()
>>> d[a_1] = 'some_data'
>>> d[a_2] = 'other'
>>> print(d)
{'a': 'some_data', 'a': 'other'}

NOTE: This solution can apply to any basic data structure like (int, float,...)

EXPLANATION :

We can use almost any object as key in dict class (or mostly known as HashMap or HashTable in other languages) but there should be a way to distinguish between keys because dict have no idea about objects.

For this purpose objects that want to add to dictionary as key somehow have to provide a unique identifier number(I name it uniq_id, it's actually a number somehow created with hash algorithm) for themself.

Because dictionary structure widely use in most of solutions, most of programming languages hide object uniq_id generation inside a hash name buildin method that feed dict in key search

So if you manipulate hash method of your class you can change behaviour of your class as dictionary key

  • This is a bad idea, because it will interfere with trying to look up values in the dictionary. The `d` dict in the last example can't be indexed with either the string `'a'` or with a new `DictStr('a')`; the original `DictStr` objects need to be remembered and tracked. – Karl Knechtel Feb 14 '23 at 18:04
  • This is functionally the same approach as @xiansweety's prior answer. – Karl Knechtel Feb 14 '23 at 18:06
0

Dictionary does not support duplicate key, instead you can use defaultdict
Below is the example of how to use defaultdict in python3x to solve your problem

from collections import defaultdict

sdict = defaultdict(list)
keys_bucket = list()

data_list = [lines.split(",") for lines in contents.split("\n")]
for data in data_list:
    key = data.pop(0)
    detail = data
    
    keys_bucket.append(key)
    if key in keys_bucket:
        sdict[key].append(detail)
    else:
        sdict[key] = detail

print("\n", dict(sdict))


Above code would produce output as follow:

{'EDF768': [[' Bill Meyer', ' 2456', ' Vet_Parking'], [' Jenny Meyer', ' 9987', ' Vet_Parking']], 'TY5678': [[' Jane Miller', ' 8987', ' AgHort_Parking'], [' Jo King', ' 8987', ' AgHort_Parking']], 'GEF123': [[' Jill Black', ' 3456', ' Creche_Parking']], 'ABC234': [[' Fred Greenside', ' 2345', ' AgHort_Parking']], 'GH7682': [[' Clara Hill', ' 7689', ' AgHort_Parking']], 'JU9807': [[' Jacky Blair', ' 7867', ' Vet_Parking'], [' Mike Green', ' 3212', ' Vet_Parking']], 'KLOI98': [[' Martha Miller', ' 4563', ' Vet_Parking']], 'ADF645': [[' Cloe Freckle', ' 6789', ' Vet_Parking']], 'DF7800': [[' Jacko Frizzle', ' 4532', ' Creche_Parking']], 'WER546': [[' Olga Grey', ' 9898', ' Creche_Parking']], 'HUY768': [[' Wilbur Matty', ' 8912', ' Creche_Parking']]}
  • This approach to the problem was already explained many times. Please look at existing answers before writing a new one. (Aside from that, the **entire point** of using `defaultdict` is so that it **won't be necessary** to check whether the key is a duplicate. – Karl Knechtel Feb 14 '23 at 18:07