25

I have this index as a dict.

index = {
    'Testfil2.txt': ['nisse', 'hue', 'abe', 'pind'],
    'Testfil1.txt': ['hue', 'abe', 'tosse', 'svend']}

I need to invert the index so it will be a dict with duplicates of values merged into one key with the 2 original keys as values, like this:

inverse = {
    'nisse': ['Testfil2.txt'],
    'hue': ['Testfil2.txt', 'Testfil1.txt'],
    'abe': ['Testfil2.txt', 'Testfil1.txt'],
    'pind': ['Testfil2.txt'], 
    'tosse': ['Testfil1.txt'],
    'svend': ['Testfil1.txt']}

My textbook has this function for inverting dictionaries:

def invert_dict(d): 
    inverse = dict() 
    for key in d: 
        val = d[key] 
        if val not in inverse: 
            inverse[val] = [key] 
        else: 
            inverse[val].append(key) 
    return inverse

It works fine for simple key:value pairs, BUT, when I try that function with a dict that has lists as values such as my index, I get this error message:

Traceback (most recent call last):
  File "<pyshell#153>", line 1, in <module>
    invert_dict(index)
  File "<pyshell#150>", line 5, in invert_dict
    if val not in inverse:
TypeError: unhashable type: 'list'

The book is no help, and I suspect that I can use tuples in some way, but I am not sure how.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Vestergaardish
  • 267
  • 1
  • 3
  • 10

6 Answers6

28

My solution for reversing a dictionary.

inverse = {}
for k,v in index.items():
    for x in v:
        inverse.setdefault(x, []).append(k)

Output:

{'nisse': ['Testfil2.txt'],
 'hue': ['Testfil2.txt', 'Testfil1.txt'],
 'abe': ['Testfil2.txt', 'Testfil1.txt'],
 'pind': ['Testfil2.txt'],
 'tosse': ['Testfil1.txt'],
 'svend': ['Testfil1.txt']}
wjandrea
  • 28,235
  • 9
  • 60
  • 81
ᴀʀᴍᴀɴ
  • 4,443
  • 8
  • 37
  • 57
  • 4
    FYI, the whole `try`/`except` nonsense could be shortened significantly by either making `new_dic` a `collections.defaultdict(list)` or with a plain `dict`, replacing the entire `try`/`except` with just `new_dic.setdefault(x, []).append(k)`, avoiding the need to handle "key exists" and "key missing" separately. – ShadowRanger Feb 18 '16 at 20:07
19

I've tried around and you want to use val not in inverse but it can't be checked if a "list is in a dict". (val is a list)

For your code a simple change will do what you want:

def invert_dict(d): 
    inverse = dict() 
    for key in d: 
        # Go through the list that is saved in the dict:
        for item in d[key]:
            # Check if in the inverted dict the key exists
            if item not in inverse: 
                # If not create a new list
                inverse[item] = [key] 
            else: 
                inverse[item].append(key) 
    return inverse
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
MSeifert
  • 145,886
  • 38
  • 333
  • 352
4

As a nested comprehension:

inverse = { v: k for k, l in index.items() for v in l }

or, perhaps more clearly:

inverse = { 
            new_key: index_key                              #body
            for index_key, index_value in index.items()     #outer loop
                for new_key in index_value                  #inner loop
            }

which is roughly equivalent to:

new_keys    =   []
new_values  =   []

for index_key, index_value in index.items():
    for new_key in index_value:
        new_keys.append(new_key)
        new_values.append(index_key)
        
inverse     =   dict(zip(new_keys,new_values))
Jeremy
  • 136
  • 1
  • 5
  • This is the most pythonic version. 1 line. – blacktj Dec 04 '22 at 11:43
  • 2
    This does not work for OP's specified input. The values in the inverted dict need to be lists as well, in case there are elements common to multiple value-lists in the original. – Karl Knechtel Mar 17 '23 at 04:59
2

You can not use list objects as dictionary keys, since they should be hashable objects. You can loop over your items and use dict.setdefault method to create the expected result:

new = {}
for k,value in index.items():
    for v in value:
        new.setdefault(v, []).append(k)

Result:

{'nisse': ['Testfil2.txt'],
 'hue': ['Testfil2.txt', 'Testfil1.txt'],
 'abe': ['Testfil2.txt', 'Testfil1.txt'],
 'pind': ['Testfil2.txt'],
 'tosse': ['Testfil1.txt'],
 'svend': ['Testfil1.txt']}

and if you are dealing with larger datasets for refusing of calling creating an empty list at each calling the setdefault() method you can use collections.defaultdict() which will calls the missing function just when it encounter a new key.

from collections import defaultdict

new = defaultdict(list)
for k,value in index.items():
    for v in value:
        new[v].append(k)

Result:

defaultdict(<type 'list'>,
    {'nisse': ['Testfil2.txt'],
     'hue': ['Testfil2.txt', 'Testfil1.txt'],
     'abe': ['Testfil2.txt', 'Testfil1.txt'],
     'pind': ['Testfil2.txt'],
     'tosse': ['Testfil1.txt'],
     'svend': ['Testfil1.txt']})
wjandrea
  • 28,235
  • 9
  • 60
  • 81
Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • Do you actually get a performance improvement with `defaultdict`? From what I remember from a related discussion, it's implemented in Python which makes it pretty slow while `dict.setdefault` is implemented in C and the time to create a new empty list is negligible. – wjandrea Aug 21 '23 at 00:00
-1

Here is a variation that uses a comprehension plus set to remove duplicates.

def invert_setdict(setdict):
    inverse = {}
    vk = [(v, k) for k, vs in index.items() for v in vs]
    for k, v in vk:
       inverse.setdefault(k, set()).add(v)

    return inverse

Example

>>> index = {
... 'Testfil2.txt': ['nisse', 'hue', 'abe', 'pind'],
... 'Testfil1.txt': ['hue', 'abe', 'tosse', 'svend']}

>>> inverse = invert_setdict(index)
>>> inverse
{'nisse': {'Testfil2.txt'},
 'hue': {'Testfil1.txt', 'Testfil2.txt'},
 'abe': {'Testfil1.txt', 'Testfil2.txt'},
 'pind': {'Testfil2.txt'},
 'tosse': {'Testfil1.txt'},
 'svend': {'Testfil1.txt'}}

If you want to convert the set values to lists:

>>> inverse = {k:list(v) for k, v in inverse.items()}
helix7
  • 34
  • 2
  • Remove duplicates? But OP's data doesn't have any duplicates... Neither does the data you're showing here... If this is filling a different need, you could post a separate question. – wjandrea Aug 20 '23 at 23:38
  • Note that sets are unordered. (And going back to lists doesn't restore the order.) If you want to preserve order, you can use a dict with `None` values as an ordered set. – wjandrea Aug 20 '23 at 23:42
  • There's no need to create `vk`; all it does is take up space. You can put its two loops directly in your code, just like in [Arman's answer](/a/35491335/4518341) – wjandrea Aug 20 '23 at 23:43
-2

Two liner solution using unpacking operator * and nested compression.

for k,v in old_dict.items():
    new_dict = {**new_dict,**{vi:k for vi in v}}
Erica Jh Lee
  • 121
  • 3
  • 12