I have about 20,000,000 pair<int, int>
which I need to associate to int
s. I did so with an unordered_map<pair<int, int>, int>
. Profiling my algorithm shows that checking whether an entry exists or not
bool exists = myMap[make_pair(a, b)] != NULL
is the performance bottleneck. I thought that retrieving this information from an unordered_map
would be really fast, as it is O(1). But constant time can be slow if the constant is big...
My hash-function is
template <>
struct tr1::hash<pair<int, int> > {
public:
size_t operator()(pair<int, int> x) const throw() {
size_t h = x.first * 1 + x.second * 100000;
return h;
}
};
Do you know any better data-structure for my problem?
Obviously I can't just store the information in a matrix, hence the amount of memory wouldn't fit into any computer in existence. All I know about the distribution is that myMap[make_pair(a, a)]
doesn't exist for any a
. And that all int
s are in a continuous range from 0 to about 20,000,000.
Think of it as a sparse 20,000,000x20,000,000-Matrix with about 20,000,000 entries but never on the main diagonal.
Suggestion
Would a vector<pair<int, int>>*
(array with N entries) expected to be faster? The lookup for a
would be trivial (just the index of the array) and then I would iterate through the vector, comparing the first
value of the pair to b
.
BIG UPDATE
I uploaded the raw data so you can see the structure.