5

I'm writing a django application where I will get a dictionary from the user that can be of variable size. I want to have a limit for how big the dictionary can be, i.e. how many (key, value) pairs it can hold. I want it to be no bigger than 200. I suspect that if I do:

if len(user_dict)>200:
    raise ValidationError("dict has too many (key, value) pairs")

python will have to count the whole dict. If the dict is huge, because of a malicious user, this will eat unnecessary processing power. Or does the dict keep track of how many objects it holds, meaning len(user_dict) is a simple lookup operation? What is the best way to solve this problem?

I was thinking something like:

i=0
for key in user_dict.keys():
    i += 1
    if i>200:
        raise ValidationError("dict has too many (key, value) pairs")
Sahand
  • 7,980
  • 23
  • 69
  • 137
  • 5
    If `user_dict` is a dictionary, then `len(..)` works in *O(1)* so constant time. – Willem Van Onsem Jan 04 '18 at 13:32
  • Willem, thank you, that answers my question. – Sahand Jan 04 '18 at 13:33
  • As Willem says - the number of keys in a dictionary is stored as part of the object itself - it doesn't need to compute it. (same with the builtin `list`, `set`, `tuple` container types) – Jon Clements Jan 04 '18 at 13:33
  • 1
    @AvinashRaj: noo... that is still *O(1)* but will be easily 200 times slower. – Willem Van Onsem Jan 04 '18 at 13:37
  • @WillemVanOnsem so you mean `len(dict_)` and for computing the first item `for i in dict_: break` will take the same time. – Avinash Raj Jan 04 '18 at 13:43
  • @AvinashRaj: `len(..)` works in constant time (and performs a lookup on a field). So that means that obtaining the `len(..)` of a dictionary with 100k elements, is the same as obtaining the `len(..)` of a dictionary with no elements at all (approximately, of course cache etc. can lead to small differences). – Willem Van Onsem Jan 04 '18 at 13:44

1 Answers1

6

Or does the dict keep track of how many objects it holds, meaning len(user_dict) is a simple lookup operation?

A dictionary - given a serious interpreter implementation like CPython - indeed keeps track of the number of key-value pairs that are stored in the dictionary. So if user_dict is indeed a dictionary, then len(user_dict) works in O(1) and is very fast. The fact that it works in constant time also means that it makes no (theoretical) difference whether we calculate the len(..) of a dict object with 100k items, or none at all.

No iteration is necessary to count the number of objects. For instance the CPython source code for the dict class has:

static Py_ssize_t
dict_length(PyDictObject *mp)
{
    return mp->ma_used;
}

So it returns the ma_used field of the dictionary object (which is thus a field containing the number of items in the dictionary).

This is also described in this file:

Dictionaries: dict and defaultdict
                               Complexity
Operation     | Example      | Class         | Notes
--------------+--------------+---------------+-------------------------------
Index         | d[k]         | O(1)      |
Store         | d[k] = v     | O(1)      |
Length        | len(d)       | O(1)      |
Delete        | del d[k]     | O(1)      |
get/setdefault| d.method     | O(1)      |
Pop           | d.pop(k)     | O(1)      |
Pop item      | d.popitem()  | O(1)      |
Clear         | d.clear()    | O(1)      | similar to s = {} or = dict()
View          | d.keys()     | O(1)      | same for d.values()

Construction  | dict(...)    | O(len(...))   | depends # (key,value) 2-tuples

Iteration     | for k in d:  | O(N)          | all forms: keys, values, items
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555