1

I want to check if an entire dictionary (both the key and value) exists in a list of dictionaries. Each dictionary can be a nested dictionary of dictionaries and lists.

When I have many scalars that I want to check if each scalar exists within a target list of scalars, I usually make the target list into a set and check existence in the set, like scalar in set(list_of_scalars). (Please let me know if this is already not the best way to have been doing this)

For dicts, I can't do my_dict in set(list_of_dicts) because that raises unhashable type: 'dict'.

Doing my_dict in list_of_dicts seems to correctly returns False if the same key name exists but value is different (which is what I want), but I'm concerned about the runtime; does python optimize this internally? What else can I be doing?

EDIT: Assume I will be performing MANY lookups, and using Python3.7

gunit
  • 3,700
  • 4
  • 31
  • 42
  • `scalar in set(list_of_scalars)` seems less concise than simply `scalar in list_of_scalars`. If you're thinking "but isn't membership testing faster for sets than lists?", that's true, but you're also converting from list to set, so you're losing more time than you gain. I don't recommend `my_dict in list_of_dicts` if you want to recurse arbitrarily deeply to find the dict. For example, `{1:2} in [{3: {1:2}}]` gives False. – Kevin Jan 28 '19 at 19:57
  • are the dicts just made up of python simple types? ie (int/list/tuple/dict/str) and if so what python version are you using? – Joran Beasley Jan 28 '19 at 20:17
  • the dicts can be nested (dicts of dicts and lists). I am using python3 – gunit Jan 28 '19 at 20:20
  • 1
    if you know where the dicts are comming from and you can order your dict the same way you can abuse the mechanics of dicts in 3.6+ and `str(needle_dict) in str(list_of_dicts)` since in 3.6+ order of dicts is guaranteed – Joran Beasley Jan 28 '19 at 20:21

2 Answers2

3

To check if a scalar exists within a list of scalars, I usually make the list into a set and check existence in the set, like scalar in set(list_of_scalars). (Please let me know if this is already not the best way to have been doing this)

The creation of the set will be an O(n) operation. Every subsequent lookup in the set will be O(1) average case, so if you are planning on performing many lookups it is worthwhile. Otherwise, if you perform just one lookup then you're better off performing a linear-search in the list (assuming its not sorted).

For dicts, I can't do my_dict in set(list_of_dicts) because that raises unhashable type: 'dict'. But my_dict in list_of_dicts runs fine, but I'm concerned about the runtime;

If you need to repeatedly perform this lookup, then depending on the nature of what you are storing in those dictionaries, you may want to reconsider using dictionaries, and opt for objects instead. Then you can define a __hash__ method for your object and store them in a a set, and lookups will be much easier.

does python optimize this internally? What else can I be doing?

You can look at the time complexity of operations with Python data structures here: TimeComplexity . Python has no way of optimizing a generic search in a generic list, and it will use the linear search behavior ( O(n) ).

vasia
  • 1,093
  • 7
  • 18
0

To optimize multiple lookups, you can create a hashable dictionary class and search in a set of hashable dictionaries:

l = [{1:2,3:4}, {5:6,7:8}]
setofdicts = set(map(hashabledict, l))
hashabledict({5:6,7:8}) in setofdicts
#True
DYZ
  • 55,249
  • 10
  • 64
  • 93