I have around 10 million + increasing users with Email and Phone numbers. Both pointing to an User ID. I created 2 Hashes. One for Email and other for Phone numbers like
//A single user with Both Email and Phone number pointing to same User ID
$redis->hSet('email-users', 'abc@xyz.com', 1);
$redis->hSet('phone-users', '+192938384849', 1);
Now as there are around millions of users, the Hash
is growing to overloaded and I also want to search through these Hashes. Like I want to get the User ID from an Email from email-users hash.
As I found that Hashes should be maintained with ZipList at Redis — best way to store a large map (dictionary) and divided into smaller buckets of a fixed size say max 10000 keys in a single Hash.
So, if I divide my 10 Million Users into buckets of 10000 keys there would be around 1000 Hashes for Emails and 1000 for Phone numbers.
My Questions is, Should I divide my users into these 1000 buckets? and if yes then how can I search through these 1000 buckets? Or is there a better alternative?
P.S. I am using PHP
and getting all 1000 Hashes and loop through them can be quite resource intensive and I am afraid that using a wrong approach would also kill the actual performance of Redis
Power.
Just for a side note, I think that we can create some algorithm like libketama for consistent hashing to place keys in random servers.
Also if it is hard to work on alphabats, we can convert each email to numbers first like a=1, b=2, c=3 ... z=26 with 0 (Zero) appended for making it unique and +s for @ and . characters. For Example
abcd@gmail.com -> 10203040+901301090+3015013
So, now we have numbers which make it easier to apply any calculations.