6

I am developing an online service, not similar to Tinder at all, however their algorithm would help me understand how to scale well.

My assumption is that each of their users has a skipped set, of users that have been visited before. I don't know exactly what they use, it may not be redis but there is this page that kind of explains an implementation in redis (not essential but informative).

I'm assuming a user is selected at random from users, requested from the server (lets assume we don't care about age, gender, etc) just people from your location. And then they check whether this user is in the skipped set.

Now what happens if someone manually or runs a bot to skip every single user. Doesn't this process begin to lag as the size of skipped nears the size of users. Soon enough every check will have already been skipped and it will get into a loop, just checking for a new random user every time and all have been seen.

How do they keep this process fast? There must be something more than just a limit on how many skipped people you have, because perhaps you skip a lot of people that never come back online and your skipped is bigger in size than the amount of people from users.

baao
  • 71,625
  • 17
  • 143
  • 203
user5163183
  • 103
  • 3
  • 1
    https://en.wikipedia.org/wiki/Bloom_filter might be useful. – Roger Lipscombe Jul 29 '15 at 09:15
  • 1
    @RogerLipscombe I don't think the question is about how to maintain the `skipped` set. It's about how to randomly select an element from the set `users \ skipped` even if `skipped` is large – Niklas B. Jul 29 '15 at 09:18
  • @NiklasB. Yes that's true, the `skipped` set / bloom filter is already handled by `redis`. – user5163183 Jul 29 '15 at 09:22
  • FWIW, you might overestimate the expected search time if skipped is large. Say we have |skipped| = a * |users| for some 0 <= a < 1. Then the expected time of the las vegas random selection algorithm is O(1 / (1-a)), which is not at all bad even for largish a. For example, if 99% of users are already visited, you will still need only about 100 random tries until you find one who wasn't – Niklas B. Jul 29 '15 at 09:37
  • I suspect that users >>> skipped, so just pick one at random; use the bloom filter. It doesn't have to be perfect. If you're a human (Tinder's target market), you're not going to skip _every_ other user, and even if you skip a large number, you're not going to remember every single one. It doesn't have to be perfect. – Roger Lipscombe Jul 29 '15 at 10:42
  • @RogerLipscombe I was moreso wondering about skipping users over a long period of time, would performance begin to decrease over time. But as well as that bots may always come in to skip every user to try and attack and slow down the server maybe? There would have to be a check for this – user5163183 Jul 29 '15 at 10:56
  • 2
    For a Tinder-like use case, I'd simply remember the last N skipped entries for each user. Set N to (sticks finger in the air) 5000. As I say, these are humans; they won't remember. – Roger Lipscombe Jul 29 '15 at 11:09
  • @RogerLipscombe Good advice, people wont remember after a while so this would cut down on memory – user5163183 Jul 29 '15 at 11:19
  • 2
    They might even have a changed their taste in potential partners by the time they've gone through 5000 candidates... – Niklas B. Jul 29 '15 at 13:27

0 Answers0