9

I'm searching for the Big-O complexity of PageRank algorithm. I hardly could found anything, I just found O(n+m) ( n - number of nodes, m - number of arcs/edges) but I didn't believe this complexity by now.

I think it is missing the convergence criteria. I didn't think that this is a constant, I think the convergence depends on the graph diameter. It might be enough to have the Big-O for one iteration, then convergence is not important.

Nevertheless PageRank need to touch every node and aggregate every incoming rank, so I expected a runtime of O(n * m).

Did I miss something? Did anyone know a valuable source for the Big-O complexity of PageRank?

Thanks in advance.

Martin Thorsen Ranang
  • 2,394
  • 1
  • 28
  • 43
Matthias Kricke
  • 4,931
  • 4
  • 29
  • 43
  • `n+m` sounds about right to me, possibly with some extra `log` terms thrown in because it needs to look stuff up. Why do you expect `O(n*m)`? That is, what is it that you expect PageRank needs to do for *every* pair of (web page, link) on the entire internet? Why should it touch its records for every single page, every time a link is found? I know they say that PageRank disperses arbitrarily far, but I don't think that means they recalculate the whole internet every time I create an `` tag. – Steve Jessop Sep 18 '12 at 09:56
  • if i get it right, then you are refering to an existing pagerank. i refer to a situation where you get a graph and want to calculate pagerank. i think it is ´O(n*m)´ because one has to touch all nodes and in worst case all edges (complete graph) and that would be ´O(n*m)´. And you are right, if one already has a pagerank there are some tricks how to evade the calculation of all pageranks. But initial one has the same pagerank on all nodes, that would be 1/n. first one needs convergence of the algorithm or at least some iterations to have a better value and do some magic with new stuff found – Matthias Kricke Sep 18 '12 at 10:11
  • I don't see what you're saying. There are `n` nodes and `m` edges, so why is "touching all nodes and in worst case all edges" `O(n*m)`? I'm pretty sure that the "tricks" start right from the beginning, so the algorithm does *not* have to touch each node `m` times to get started. IIRC, Google used to re-caculate PageRank from scratch approx. daily, but doesn't any more. In practice all it does is update, although I'm sure they have the means to bootstrap it for test purposes etc. – Steve Jessop Sep 18 '12 at 10:25
  • I don't know why you are so concerned about google... google isn't the only one interested in pageranks. The algorithm is useful for a lot of other applications as well and the runtime complexity has nothing to do with it as well. The point for ´O(n*m)`: you have to calculate the PR for every node. In one operation you push the current pagerank to all your neighbours. this mean for every node ( which is n) we push a value to all adjacent nodes (which is m in a complete graph) and aggregate the values there (constant time) after all pushes are done, one would have the new pagerank oft all sides – Matthias Kricke Sep 18 '12 at 10:47
  • don't get me wrong, I could imagine that `O(n+m)` is possible, but your argumentation didn't lead to the trick done to approach this. – Matthias Kricke Sep 18 '12 at 10:58
  • 1
    Be aware that PageRank is not exactly an algorithm, but a property of a graph. There are actually dozens of algorithms to compute it in various contexts and using various techniques. – phs Nov 10 '15 at 22:21

1 Answers1

4

After some research and further thinking I have come to the conclusion that O(n+m) ist the real thing.

Because even in a complete graph, one has to touch each edge twice. One could not touch every edge, that was the mistake in my thinkings. Therefor one has to touch at least every node, which is n times and two times each edge which is m in big O. So the correct answer is O(n+m)

Matthias Kricke
  • 4,931
  • 4
  • 29
  • 43
  • 1
    Could you please share some resources which explain the time complexity of finding the page rank? I am having trouble understanding how the repeated matrix multiplication (of the transition matrix and the distribution vector) can be done in less than quadratic time. – gvas Mar 27 '17 at 02:44
  • @gvas the trick is you don't do it that way. you reduce it to something faster – Chris Apr 23 '18 at 16:49