STL implementations are often not perfect in terms of performance (no holy wars, please).
If you know a guaranteed and sensible upper on the number of unique elements (N), then you can trivially implement your own hash table of size 2^s >> N. Here is how I usually do it myself:
int size = 1;
while (size < 3 * N) size <<= 1;
//Note: at least 3X size factor, size = power of two
//count = -1 means empty entry
std::vector<std::pair<X, int>> table(size, make_pair(X(), -1));
auto GetHash = [size](X val) -> int { return std::hash<X>()(val) & (size-1); };
for (auto x : input) {
int cell = GetHash(x);
bool ok = false;
for (; table[cell].second >= 0; cell = (cell + 1) & (size-1)) {
if (table[cell].first == x) { //match found -> stop
ok = true;
break;
}
}
if (!ok) { //match not found -> add entry on free place
table[cell].first = x;
table[cell].second = 0;
}
table[cell].second++; //increment counter
}
On MSVC2013, it improves time from 0.62 secs to 0.52 secs compared to your code, given that int is used as type X.
Also, we can choose a faster hash function. Note however, that the choice of hash function depends heavily on the properties of the input. Let's take Knuth's multiplicative hash:
auto GetHash = [size](X val) -> int { return (val*2654435761) & (size-1); };
It further improves time to 0.34 secs.
As a conclusion: do you really want to reimplement standard data structures to achieve a 2X speed boost?
Notes: Speedup may be entirely different on another compiler/machine. You may have to do some hacks if your type X is not POD.