There is a contradiction on whether the insert operation on dict() or add operation in set() is O(n) or O(1), where n is the length of the string.
Suppose we have strings which vary in length i.e. n1, n2, ...n_x. Then the time complexity of performing the following:
s = set()
d = dict()
for x in {N}: # where N = [n1, n2, ... n_x]
s.add(x)
d[x] = 1
is O(len(N) * Z)
where Z = len(n_1) + len(n_2) + ... len(n_x)
If we assume that add or insert is O(1) operation then the time complexity will be O(len(N)).
Is the above true?
From: http://svn.python.org/projects/python/trunk/Objects/stringobject.c we see that the computation of the hash depends on the length of the string which is what I am assuming is len below:
static long string_hash(PyStringObject *a)
{
register Py_ssize_t len;
register unsigned char *p;
register long x;
if (a->ob_shash != -1)
return a->ob_shash;
len = Py_SIZE(a);
p = (unsigned char *) a->ob_sval;
x = *p << 7;
while (--len >= 0)
x = (1000003*x) ^ *p++;
x ^= Py_SIZE(a);
if (x == -1)
x = -2;
a->ob_shash = x;
return x;
}
Here (efficiency of long (str) keys in python dictionary) someone showed that varying the length of the string does not affect the time of computing a hash. But this contradicts the above code.
From the following link we know that the hash value once computed is stored in the object. This means that lookup will be constant time O(1). Get dictionary keys hashes without recalculation However, the insertion/adding which is when the computation of the hash is done should be linear.