Hash the keys, sort the hashes, and store sorted hashes in compressed form.
TL;DR
The algorithm I propose may be considered as an extension to the solution for similar (simpler) problem.
- To each key: apply a hash function that maps keys to integers in range [0..h]. It seems to be reasonably good to start with h = 2 * number_of_keys.
- Fill all available memory with these hashes.
- Sort the hashes.
- If hash value is unique, write it to the list of unique hashes; otherwise remove all copies of it and write it to the list of duplicates. Both these lists should be kept in compressed form: as difference between adjacent values, compressed with optimal entropy coder (like arithmetic coder, range coder, or ANS coder). If the list of unique hashes was not empty, merge it with sorted hashes; additional duplicates may be found while merging. If the list of duplicates was not empty, merge new duplicates to it.
- Repeat steps 1..4 while there are any unprocessed keys.
- Read keys several more times while performing steps 1..5. But ignore all keys that are not in the list of duplicates from previous pass. For each pass use different hash function (for anything except matching with the list of duplicates from previous pass, which means we need to sort hashes twice, for 2 different hash functions).
- Read keys again to convert remaining list of duplicate hashes into list of plain keys. Sort it.
- Allocate array of 2 billion bits.
- Use all unoccupied memory to construct an index for each compressed list of hashes. This could be a trie or a sorted list. Each entry of the index should contain a "state" of entropy decoder which allows to avoid decoding compressed stream from the very beginning.
- Process the list of posts and update the array of 2 billion bits.
- Read keys once more co convert hashes back to keys.
While using value h = 2*number_of_keys seems to be reasonably good, we could try to vary it to optimize space requirements. (Setting it too high decreases compression ratio, setting it too low results in too many duplicates).
This approach does not guarantee the result: it is possible to invent 10 bad hash functions so that every key is duplicated on every pass. But with high probability it will succeed and most likely will need about 1GB RAM (because most compressed integer values are in range [1..8], so each key results in about 2..3 bits in compressed stream).
To estimate space requirements precisely we might use either (complicated?) mathematical proof or complete implementation of algorithm (also pretty complicated). But to obtain rough estimation we could use partial implementation of steps 1..4. See it on Ideone. It uses variant of ANS coder named FSE (taken from here: https://github.com/Cyan4973/FiniteStateEntropy) and simple hash function implementation (taken from here: https://gist.github.com/badboy/6267743). Here are the results:
Key list loads allowed: 10 20
Optimal h/n: 2.1 1.2
Bits per key: 2.98 2.62
Compressed MB: 710.851 625.096
Uncompressed MB: 40.474 3.325
Bitmap MB: 238.419 238.419
MB used: 989.744 866.839
Index entries: 1'122'520 5'149'840
Indexed fragment size: 1781.71 388.361
With the original OP limitation of 10 key scans optimal value for hash range is only slightly higher (2.1) than my guess (2.0) and this parameter is very convenient because it allows using 32-bit hashes (instead of 64-bit ones). Required memory is slightly less than 1GB, which allows to use pretty large indexes (so step 10 would be not very slow). Here lies a little problem: these results show how much memory is consumed at the end, but in this particular case (10 key scans) we temporarily need more than 1 GB memory while performing second pass. This may be fixed if we drop results (unique hashes) of the first first pass and recompute them later, together with step 7.
With not so tight limitation of 20 key scans optimal value for hash range is 1.2, which means algorithm needs much less memory and allows more space for indexes (so that step 10 would be almost 5 times faster).
Loosening limitation to 40 key scans does not result in any further improvements.