3

I've developed a program in python, organizing data in flattened dictionaries. As the size of the dict increases, the program becomes slower due to a intensive keys search. Looking at nested dictionaries structure, seems to me that a "hierarchical" approach may speed up the search of the keys. Am I wrong?

Is the nested dict:

nested_dict = { 'dictA': {'key_1': 'value_1', 'key_2': 'value_2'},
                'dictB': {'key_3': 'value_3', 'key_4': 'value_4', 'key_5': 'value_5'},
                ...
                'dictZ': {'key_m': 'value_m', 'key_n': 'value_n'}}

faster than the flattened dict:

dictionary = {'key_1': 'value_1',
              'key_2': 'value_2',
              ...
              'key_n': 'value_n'}

Edit: Added few code examples

Below a piece of the code generally I use. The Program is quite big, so there isn't a specific code to be evaluated

Assignment:

dictionary['key_1'] = dictionary2['key_a']
dictionary['key_3'] = dictionary2['key_a']*dictionary['key_4']

Conditional statement:

if( (0 == dictionary['key_1']) and
    (dictionary2['key_b'] >= dictionary['key_3']) ):
martineau
  • 119,623
  • 25
  • 170
  • 301
Nem
  • 41
  • 3
  • 1
    How are you searching for a key? If you are searching a nest for a key then a flat will be significantly faster. – Error - Syntactical Remorse Apr 24 '19 at 20:40
  • 2
    What dictionary operations are you doing a lot of? Adding new keys, looking up keys, etc. How big are your dictionaries getting? Hundreds, thousands, millions of items? – Patrick Haugh Apr 24 '19 at 20:41
  • I'm searching keys using the following syntax: dictionary['key_1']. I've about 70 keys in the biggest dict. – Nem Apr 24 '19 at 20:45
  • Note that in case of nested dictionaries you'll have to search using `dictionary['dictA']['key1']` format. how will you keep track of with keys are in which parent dicts? The overhead of maintaining nested structure may not be worth it. Your data structure will be based on how you use it, in some cases it will make sense to have hierarchical dict, in some cases flattened. – Verma Apr 24 '19 at 20:49
  • 3
    70 keys is large for a human to conceptualize about, but tiny for a computer to process. If you are encountering any performance woes, it is not due to the size of that dict. In your case, nested vs flat should make no difference whatsoever. – Tom Lubenow Apr 24 '19 at 21:01
  • Having a nested dict for me is fine, I feel more comfortable with hierarchical structure than a flattened one. My flattened dict has heterogeneous data and organize them with sub-dicts probably will helps code readability. – Nem Apr 24 '19 at 21:04
  • @Tom Lubenow, the issue is not the size of the dict, are the number of the access to the keys. Beside this, I'd like to understand if I've to change my approach, and for this your answer is quite clear. – Nem Apr 24 '19 at 21:12
  • 70 keys is not big at all. The speed of accessing items in a dictionary is not dependent on the size of the dict. Theoretically, it should take the same amount of time to access an item in a 2 elements dictionary and a thousand element dictionary (the reality is a little more complicated, but not much.). It's likely that the actual slowness is caused by something else. – Patrick Haugh Apr 24 '19 at 21:18
  • You should see this talk about how dictionaries are implemented https://www.youtube.com/watch?v=p33CVV29OG8 – Boris Verkhovskiy Apr 24 '19 at 21:40
  • The code in your conditional statement isn't doing searches. It does 3 dictionary look-ups which are all relatively fast. If things slow noticeably when size of the dict increases, it's like due to something else...and we'll need to see more code to really be able to make any suggestions. Consider supplying a [mcve] that that illustrates and reproduces the issue. – martineau Apr 24 '19 at 21:46
  • @martineau. There isn't very much more to see in my code. I've a main cycle with hundreds of assignments and IF statements (there are more py files involved). The main cycle is called for 20 to 30 thousands times. What I saw is that, increasing the dictionary to have more code maintainability and organization, the elapsed time increase more than expected. – Nem Apr 24 '19 at 22:10
  • Hmm, in that case you might be trying to optimize the wrong part of your code. I suggest you profile it and see where it's spending most of its execution time. See [How can you profile a Python script?(https://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script). – martineau Apr 24 '19 at 22:28

1 Answers1

3

Looking at nested dictionaries structure, seems to me that a "hierarchical" approach may speed up the search of the keys. Am I wrong?

Yes :-)

The flat dict space has O(1) lookup irrespective of size. That is what makes hash tables so attractive as a data structure.

Adding hierarchy just adds extra hashing steps and lookup steps.

In some contexts, containers do get some cache locality benefits by being small, but in Python the containers have references to objects that are scattered all over memory, so compactness doesn't help much.

In addition, Python is an interpreted language, so adding a extra layer of lookups also entails more opcode evaluations. This would swamp any possible benefits for compactness.

Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485