5

A was asked an interesting question on an interview lately.

  • You have 1 million users
  • Each user has 1 thousand friends
  • Your system should efficiently answer on Do I know him? question for each couple of users. A user "knows" another one, if they are connected through 6 levels of friends.

E.g. A is friend of B, B is a friend of C, C is friend of D, D is a friend of E, E is a friend of F. So we can say that, A knows F.

Obviously you can't to solve this problem efficiently using BFS or other standard traversing technic. The question is - how to store this data structure in DB and how to quickly perform this search.

Cœur
  • 37,241
  • 25
  • 195
  • 267
silent-box
  • 1,649
  • 3
  • 21
  • 40
  • 4
    I *guess* the probability for 'yes' is about 99.99999%, so maybe you could hardcode `return yes`, but I'll wait to see an answer. – alain Sep 02 '17 at 15:31
  • 2
    Possible duplicate of [Challenge,how to implement an algorithm for six degree of separation?](https://stackoverflow.com/questions/2076715/challenge-how-to-implement-an-algorithm-for-six-degree-of-separation) – sascha Sep 02 '17 at 15:40
  • @RoryDaulton no, sorry. fixed that – silent-box Sep 02 '17 at 15:52
  • @sascha Actually I proposed traversal algorithms and Graph databases, but my interviewer said that is not acceptable (linear time for 1E18 elements is too slow). Unfortunately I didn't find anything else on the link you suggested – silent-box Sep 02 '17 at 15:55
  • @alain Nice idea! Do you know how to prove that mathematically? – silent-box Sep 02 '17 at 15:56
  • And who says you can beat linear time (you probably need to formalize the problem a bit more; with some setup-phase, you can pre-compute everything and answer by searching a hash-table which solves the task)? One small improvement can be bidirectional-search, but that's some kind of trade-off. – sascha Sep 02 '17 at 15:56
  • Hmm maybe it's simply `1 - 1/(1000^6)` but I'm not 100% sure... – alain Sep 02 '17 at 16:01
  • @sascha The task was about a real-world system, where store and update 1E18 elements cache for each user is not an option (I proposed that too, but it was quickly rejected). Interviewer said that the problem was specifically designed to have only one solution – silent-box Sep 02 '17 at 16:02
  • 1
    *Interviewer said that the problem was specifically designed to have only one solution*: This is the moment i would be worried about the competence on the other side (but maybe you did skip some further assumptions, who knows). – sascha Sep 02 '17 at 16:04
  • There is a data structure called disjointed set, which can solve this kind of problem really quickly. Disjoint set maintained set which the elements inside it are connected. You can make set,merge set, and query the set. I looked up Introduction to algorithms(CLRS), it said that it can solve this problem in about linear time – codecrazer Sep 02 '17 at 15:58
  • `... connected through 6 levels of friends.` DYM: exactly 6 handshakes, or maximum 6? – wildplasser Sep 02 '17 at 19:32

2 Answers2

7

What's wrong with BFS?

Execute three steps of BFS from the first node, marking accessible users by flag 1. It requires 10^9 steps.

Execute three steps of BFS from the second node, marking accessible users by flag 2. If we meet mark 1 - bingo.

MBo
  • 77,366
  • 5
  • 53
  • 86
0

What about storing the data as 1 million x 1 million matrix A where A[i][j] is the minimum number of steps to reach from user i to user j. Then you can query it almost instantly. The update however is more costly.

algrid
  • 5,600
  • 3
  • 34
  • 37