Defeating the time complexity of Python Dictionary

Question

I have a Python dictionary whose keys are Strings consisting of lower-case English alphabets and values are ints. Moreover, there are exactly 5e6 unique keys, all of them are Strings with lengths of exactly 10. Surprisingly, the lookup isn't taking much time. I was expecting the execution to take around 4s or more, but it is not exceeding 2.5s.

I converted the Python code to C++, the analogy of Dictionary being a map. I tried with map, unordered_map and gp_hash_table, while all of them were taking more than 2s in C++.

Here's the Generator I've used to generate the unique strings.

from sys import stdout

def increment(l):
    n = len(l)
    i = n - 1
    while i >= 0 and l[i] == 'z':
        l[i] = 'a'
        i -= 1
    l[i] = chr(ord(l[i]) + 1)

print(5 * 10**6)

string = ['a' for i in range(10)]


for i in range(5 * 10**6):
    stdout.write(''.join(string) + '\n')
    increment(string)

string = ['a' for i in range(10)]

print(10**6)
for i in range(5 * 10**6):
    stdout.write(''.join(string) + '\n')
    increment(string)

The output of this program is stored in a file named Strings.txt using the command python3 Test.py > Strings.txt., after which the file Strings.txt will look like this.

5000000
aaaaaaaaaa
aaaaaaaaab
aaaaaaaaac
aaaaaaaaad
aaaaaaaaae
aaaaaaaaaf
...
...
...
aaaaakymlr
1000000
aaaaaaaaaa
aaaaaaaaab
aaaaaaaaac
...
...
...
aaaaakymlq
aaaaakymlr

The following is the CPP Code I was referring to in the above context.

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
using namespace std;
using namespace __gnu_pbds;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    int N = 0;
    cin >> N;
    map<string, int> freq;
    // unordered_map<string, int> freq;
    // gp_hash_table<string, int> freq;
    for(int i = 0; i < N; i++) {
        string s;
        cin >> s;
        freq[s]++;
    }
    int Q = 0;
    cin >> Q;
    for(int i = 0; i < Q; i++) {
        string s;
        cin >> s;
        cout << freq[s] << '\n';
    }
    return 0;
}

And the following is the Python3 Code I used.

from collections import defaultdict
from sys import stdin, stdout

input = stdin.readline

freq = defaultdict(int)
for i in range(int(input())):
    freq[input()] += 1

for i in range(int(input())):
    stdout.write(str(freq[input()]) + '\n')

Here are the results when the above codes are executed.

suman@Skynet:~/Documents/String_Pairs$ python3 Test.py > Strings.txt

suman@Skynet:~/Documents/String_Pairs$ time python3 Dict.py < Strings.txt > P_out.txt

real    0m3.145s
user    0m2.662s
sys     0m0.164s
suman@Skynet:~/Documents/String_Pairs$ time python3 Dict.py < Strings.txt > P_out.txt

real    0m2.772s
user    0m2.568s
sys     0m0.204s


suman@Skynet:~/Documents/String_Pairs$ g++ -o exe Map.cpp -O2 -std=c++17
suman@Skynet:~/Documents/String_Pairs$ time ./exe < Strings.txt > Cpp_out.txt

real    0m2.346s
user    0m2.265s
sys     0m0.080s
suman@Skynet:~/Documents/String_Pairs$ time ./exe < Strings.txt > Cpp_out.txt

real    0m2.513s
user    0m2.417s
sys     0m0.096s


suman@Skynet:~/Documents/String_Pairs$ g++ -o exe Unordered_Map.cpp -O2 -std=c++17
suman@Skynet:~/Documents/String_Pairs$ time ./exe < Strings.txt > Cpp_out.txt

real    0m2.769s
user    0m2.660s
sys     0m0.108s
suman@Skynet:~/Documents/String_Pairs$ time ./exe < Strings.txt > Cpp_out.txt

real    0m2.806s
user    0m2.690s
sys     0m0.116s


suman@Skynet:~/Documents/String_Pairs$ g++ -o exe gp_hash_table.cpp -O2 -std=c++17
suman@Skynet:~/Documents/String_Pairs$ time ./exe < Strings.txt > Cpp_out.txt

real    0m2.099s
user    0m1.686s
sys     0m0.412s
suman@Skynet:~/Documents/String_Pairs$ time ./exe < Strings.txt > Cpp_out.txt

real    0m2.009s
user    0m1.605s
sys     0m0.404s
suman@Skynet:~/Documents/String_Pairs$

Now, the only thing I am concerned about is the fact that Python3 is 5x slower than CPP but still, it's Competing with CPP when working with hash tables.

Is there any way I can beat the time complexity of the Python hash table?

Any help is greatly appreciated.

UPDATE: Here's an update, that excludes time taken for reading strings.

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
using namespace std;
using namespace __gnu_pbds;
using namespace chrono;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    int N = 0;
    cin >> N;

    vector<string> words(N);

    for(int i = 0; i < N; i++) {
        cin >> words[i];
    }

    // map<string, int> freq;
    // unordered_map<string, int> freq;
    gp_hash_table<string, int> freq;

    auto start = high_resolution_clock::now();
    for(string word : words) {
        freq[word]++;
    }

    auto end = high_resolution_clock::now();
    auto duration = duration_cast<microseconds>(end - start);
    cout << duration.count() / 1e6 << '\n';



    int Q = 0;
    cin >> Q;
    vector<string> queries(Q);
    for(int i = 0; i < Q; i++) {
        cin >> queries[i];
    }

    vector<int> results(Q);
    start = high_resolution_clock::now();
    for(int i = 0; i < Q; i++) {
        results[i] = freq[queries[i]];
    }
    end = high_resolution_clock::now();
    duration = duration_cast<microseconds>(end - start);
    cout << duration.count() / 1e6 << '\n';

    for(int i = 0; i < Q; i++) {
        cout << results[i] << '\n';
    }

    return 0;
}

The Python Code

from collections import defaultdict
from time import time
from sys import stdin, stdout

input = stdin.readline

freq = defaultdict(int)

strings = []

for i in range(int(input())):
    strings.append(input())

start = time()
for string in strings:
    freq[string] += 1
end = time()
print("%.4f" %(end - start))

queries = []
output = []

for i in range(int(input())):
    queries.append(input())

start = time()

for query in queries:
    output.append(freq[query])

end = time()

print("%.4f" %(end - start))

stdout.write('\n'.join(map(str, output)))

Even now, Python is running faster than CPP. The results:

Cpp_out.txt (the time taken for map, unordered_map and gp_hash_table - all are greater than 2s).

2.28297
0.109844
1
1
...
...
...

P_out.txt

1.7818
0.1977
1
1
...
...

UPDATE 2: I have modified the code so that I don't include time taken for reading or writing as well as using references everywhere. Now it is kinda acceptable that CPP beats Python3 in hashing. Here are the benchmarks.

// CPP Code
#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
using namespace std;
using namespace __gnu_pbds;
using namespace chrono;

struct djb2 {
    unsigned long operator()(const string& str) const {
        unsigned long hash = 5381;
        for (auto c : str)
            hash = ((hash << 5) + hash) + c;
        return hash;
    }
};


int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    int N = 0;
    cin >> N;

    vector<string> words(N);

    for(int i = 0; i < N; i++) {
        cin >> words[i];
    }

    // map<string, int> freq;
    // unordered_map<string, int> freq;
    gp_hash_table<string, int> freq;

    auto start = high_resolution_clock::now();
    for(const string &word : words) {
        freq[word]++;
    }

    auto end = high_resolution_clock::now();
    auto duration = duration_cast<microseconds>(end - start);
    cout << duration.count() / 1e6 << '\n';



    int Q = 0;
    cin >> Q;
    vector<string> queries(Q);
    for(int i = 0; i < Q; i++) {
        cin >> queries[i];
    }

    vector<int> results(Q);
    start = high_resolution_clock::now();
    for(int i = 0; i < Q; i++) {
        results[i] = freq[queries[i]];
    }
    end = high_resolution_clock::now();
    duration = duration_cast<microseconds>(end - start);
    cout << duration.count() / 1e6 << '\n';

    for(int i = 0; i < Q; i++) {
        cout << results[i] << '\n';
    }

    return 0;
}

# Python3 Code
from collections import defaultdict
from time import time
from sys import stdin, stdout

input = stdin.readline

freq = defaultdict(int)

strings = []

for i in range(int(input())):
    strings.append(input())

start = time()
for string in strings:
    freq[string] += 1
end = time()
print("%.4f" %(end - start))

queries = []
output = []

for i in range(int(input())):
    queries.append(input())

start = time()

for query in queries:
    output.append(freq[query])

end = time()

print("%.4f" %(end - start))

stdout.write('\n'.join(map(str, output)))

Cpp_out.txt

1.60026
0.071471

P_out.txt

1.7849
0.1987

So, it is clear that CPP's gp_hash_table beats Python3's hashtable.

I've gone through Python3's implementation of hash table. They are using something called SIPHASH. I want to generate strings such that the number of collisions when the strings are hashed is maximum. It is something like a Collision attack, but I want at least $5000$ unique strings to produce same hash.

Can anyone provide any kind of resource for this (note that I need at least $5000$ unique strings that produce same hash).

@kiner_shah Just the same script `Map.cpp` contains, the only difference being `unordered_map` is used instead of the other two (they are commented). — Sai Suman Chitturi, Nov 09 '21 at 04:18
Is `defaultdict` really a hash table? Does it compute hashes and store the keys as per the hashes? — kiner_shah, Nov 09 '21 at 04:23
@kiner_shah Yes, it is a hash table, at least from this answer https://stackoverflow.com/questions/114830/is-a-python-dictionary-an-example-of-a-hash-table/33459086 — Sai Suman Chitturi, Nov 09 '21 at 04:29
You're including time to read and write > 50 megabytes. The results have essentially nothing to do with hash table performance. — Gene, Nov 09 '21 at 04:32
@Gene I've updated the scripts, can you answer the updated question? — Sai Suman Chitturi, Nov 09 '21 at 04:50
@ChitturiSaiSuman Your updated code iterates on `string` items by value, meaning it has to do dynamic allocations and copy all the content. You should be doing `for (const string &word : words)` (or `const auto &word`) — Cruz Jean, Nov 09 '21 at 06:47
Hi, thanks for pointing it out. Yeah, I should have iterated the strings differently. I have one last doubt. Is there any way to beat the Python's Hash table? I want it to take a considerable amount of time to hash given keys. — Sai Suman Chitturi, Nov 09 '21 at 07:31
Micro benchmarks like this are very hard to get right. Even when done right, it's rare that the difference is important. (You never mentioned what you're doing where this matters.) @CruzJean pointed out just one way the results can be skewed. Another is that your data is in sequential order. If the hash function happens to be pseudo-monotonic, you'll get a _much_ better memory access pattern than another that's random. That alone can make a factor of 10 or more difference. In all, it's doubtful that either algorithm uniformly "beats" the other across a collection of normal use cases. — Gene, Nov 12 '21 at 04:20
You might have fun checking out this. https://github.com/eklitzke/associative-benchmark There are other similar efforts. Google "python dict vs c++ map" and similar. — Gene, Nov 12 '21 at 04:38
The C++ equivalent of a Python dictionary is not `map` but `unordered_map`. — Mark Ransom, Nov 15 '21 at 03:38
What about changing `for(string word : words)` to `for(string& word : words)`? — xskxzr, Nov 15 '21 at 03:40
You should presize your C++ map, ie, [this](https://www.cplusplus.com/reference/unordered_map/unordered_map/reserve/). — Richard, Nov 15 '21 at 07:25
With the updates about finding hash-collisions I don't understand how that relates to the current code. Is the goal to find 5000 different strings such that djb2.operator() returns the same value for all of them? (Or the same value modulo something?) — Hans Olsson, Nov 19 '21 at 08:29

Hans Olsson · Answer 1 · 2021-11-18T18:45:55.327

The reserve call as indicated in a comment is the most important for unordered_map. Additionally you should use the right implementation of unordered_map (not g++'s default one, VS 2019 is ok-ish) and consider the architecture you use.

With Microsoft tools (Python37_64 and VS 2019) on my computer - using x86 except the last lines:

Python 1.697 0.202
Original C++ map: 1.361 0.138
Original C++ unordered_map: 1.035 0.067
+const string&: 1.013 0.067
+reserve: 0.675 0.056
On x64 instead: 0.686 0.060 (so almost the same)

Using robin_hood (as suggested by @thc ) isn't faster on x86 before you add reserve - and even then the difference isn't as dramatic for gcc's unordered_map:

robin_hood unordered_map x86: 1.107 0.100
+const string& x86: 1.059 0.101
+reserve x86: 0.477 0.108

However, robin_hood is also faster in building but not in lookup if you run on x64 (and reserve is still important):

Original VS C++ unordered_map x64: 1.130 0.065
+const string& x64: 1.119 0.063
+reserve x64: 0.671 0.065
robin_hood unordered_map x64: 0.611 0.069
+const string& x64: 0.577 0.072
+reserve x64: 0.384 0.069
Original g++ C++ unordered_map x64: 1.639 0.135
+const string& x64: 1.611 0.135
+reserve x64: 0.946 0.128

Code fragment (note it is important that reserve is inside timing for fairness):

auto start = high_resolution_clock::now();
unordered_map<string, int> freq;
freq.reserve(Q);
for(const string& word : words) {
    freq[word]++;
}
auto end = high_resolution_clock::now();

Added: There is also a de-allocation cost for 'freq' (and robin_hood seems faster there). Just a fraction of building the map, but still significant.

score 0 · Accepted Answer · answered Nov 22 '21 at 13:02

I found the answer to this post myself. Python uses randomized Hashing algorithms, which will make it nearly impossible to generate two strings (consisting of entirely lowercase or entirely uppercase characters) that produce the same hash values.

Now, coming to the performance issues of C++ vs Python3, I was also taking the time taken to read and write into consideration, because of which it showed C++ slower than Python3.

When I rectified the codes and made sure there is no additional overheads in read/write or assigning (in the case of C++, I used const string& to refer map entries), I found that C++ runs faster than Python3 when gp_hash_table is used.

Thanks to everyone who spent time/efforts working on my question.

score -1 · Answer 3 · answered Nov 17 '21 at 21:22

Might be the copy construction in this for loop:

for(string word : words) {
    freq[word]++;
}

The "string word" copies a new string each iteration. Consider changing to access by reference as so:

for(const string& word : words) {
    freq[word]++;
}

Defeating the time complexity of Python Dictionary

3 Answers3