-1

I have list with dictionaries with same keys and different values, but sometimes could be duplicates:

[{'colorName': u'red',
  'color_thumb': [],
  'main_zoom_picture': u'webcontent/0007/991/393/cn7991393.jpg',
  'pic_uris': [(u'S', u'webcontent/0007/991/248/cn.jpg')],
  'swatch_image_path': u'webcontent/0007/991/248/cn7991248.jpg'},
 {'colorName': u'red',
  'color_thumb': [],
  'main_zoom_picture': u'webcontent/0007/991/393/cn7991393.jpg',
  'pic_uris': [(u'S', u'webcontent/0007/991/248/cn.jpg')],
  'swatch_image_path': u'webcontent/0007/991/248/cn7991248.jpg'}]

I'm doing:

[dict(tupleized) for tupleized in set(tuple(item.items()) for item in shared_list)]

And receiving:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
user3761151
  • 143
  • 1
  • 7

3 Answers3

4

This does the trick on your set of data for me.

from itertools import groupby

print [k for k,v in groupby(sorted(shared_list))]

Taken from this question

Community
  • 1
  • 1
Christoph Hegemann
  • 1,434
  • 8
  • 13
1

You have your loops inverted; you need to loop over shared_list *first:

[dict(tupleized) for item in shared_list for tupleized in set(tuple(item.items()))]

A list comprehension lists the loops in nesting order; left is outermost.

Next problem is that your values contain lists, these cannot be added to a set unaltered.

Next is that you need to use the set externally to the loop to test if a dictionary has been seen before:

def immutable_repr(d):
    return tuple((k, tuple(v)) if isinstance(v, list) else v
                 for k, v in sorted(d.items()))

seen = set()
[d for d in shared_list if immutable_repr(d) not in seen and not seen.add(immutable_repr(d))]

Here immutable_repr() takes care of producing an immutable tuple from each dictionary:

>>> immutable_repr(shared_list[0])
(u'red', ('color_thumb', ()), u'webcontent/0007/991/393/cn7991393.jpg', ('pic_uris', ((u'S', u'webcontent/0007/991/248/cn.jpg'),)), u'webcontent/0007/991/248/cn7991248.jpg')

The sorting ensures that even for dictionaries with a different key-order (which can alter based on the insertion and deletion history of the dictionary) the test for having seen it still works.

and seen is used to track which ones have been seen so far, to filter out any subsequent duplicates:

>>> from pprint import pprint
>>> seen = set()
>>> pprint([d for d in shared_list if immutable_repr(d) not in seen and not seen.add(immutable_repr(d))])
[{'colorName': u'red',
  'color_thumb': [],
  'main_zoom_picture': u'webcontent/0007/991/393/cn7991393.jpg',
  'pic_uris': [(u'S', u'webcontent/0007/991/248/cn.jpg')],
  'swatch_image_path': u'webcontent/0007/991/248/cn7991248.jpg'}]
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
0

Martijin Pieters' explanation is correct about your error in the list comprehension. The items() returned for each dict in the list will be a list. In other words, a set cannot hash a list of lists.

However, you can store a tuple of tuples within a set. So you can make following change to your line of code.

>>> [dict(tupleized) for tupleized in set([tuple(tup for tup in item.items()) for item in shared_list]

Also, Christoph Hegemann's answer is very elegant. If you have the time and inclination, please check itertools (I discovered it recently and it's great to use).

My apologies to Christoph, because I would up-vote your answers but I just became a community member recently. So I have no rep :(

rainman
  • 81
  • 1
  • 6