13

I am trying to build a directed graph and compute personalized page rank over this graph. So suppose I have a graph with vertices {1,2,3,4} and edges going from 2, 3, and 4 to vertex 1, I would like to:

(1) compute the personalized page rank of every vertex with respect to 1

(2) compute the personalized page rank of every vertex with respect to 2.

The question is how I should pass this option in the personalized page rank function. The following code does not seem to do what I want:

import networkx as nx

G = nx.DiGraph()

[G.add_node(k) for k in [1,2,3,4]]
G.add_edge(2,1)
G.add_edge(3,1)
G.add_edge(4,1)


ppr1 = nx.pagerank(G,personalization={1:1, 2:1, 3:1, 4:1})
ppr2 = nx.pagerank(G,personalization={1:2, 2:2, 3:2, 4:2})

Right now ppr1 == ppr2, even though it should not be the case.

================================================================== UPDATE.

In response to comment below, my understanding of personalized page rank comes from the following:

An equivalent definition is in terms of the terminal node of a random walk starting from s. Let (X0, X1, . . . , XL) be a random walk starting from X0 = s of length L ∼ Geometric(α). Here by L ∼ Geometric(α) we mean Pr[L = ] = (1−α) α. This walk starts at s and does the following at each step: with probability α, terminate; and with the remaining probability 1 − α, continue to a random out-neighbor of the current node. Here if the current node is u, the random neighbor v ∈ N out(u) is chosen with probability wu,v if the graph is weighted or with uniform probability 1/dout(u) if the graph is unweighted. Then the PPR of any node t is the probability that this walk stops at t:

Found on page 6 of this thesis: https://cs.stanford.edu/people/plofgren/bidirectional_ppr_thesis.pdf

So I suppose what I am looking for when computing "the personalized page rank of t with respect to s" is if we start a random walk from s according to the process described above, what is the probability that this walk terminates at t.

xiaolingxiao
  • 4,793
  • 5
  • 41
  • 88
  • 1
    You'll have to explain what you mean by "personalized with respect to..." In PageRank there is a possibility to jump uniformly to a random page. The `personalization` in networkx allows for that jump to have different probabilities of landing at different pages. In your first case all pages get weight 1, so the jump is uniform. In the second case all pages get weight 2, so again the jump is uniform. So both give the same result (and it would be the same if you didn't assign weights at all). – Joel Apr 04 '17 at 01:36
  • @Joel just added more info in question. – xiaolingxiao Apr 04 '17 at 02:08

1 Answers1

11

In the conceptualization of PageRank, a random surfer is moving around following links. At each step there is a nonzero probability the surfer goes to a random page (as opposed to following a link). If the choice of that random page is weighted, then it is referred to as personalized PageRank.

In your case you want that jump to be to a single specific page. So you need to tell it that all the other pages have zero probability of being selected in those steps when the surfer jumps rather than following an edge.

ppr1 = nx.pagerank(G,personalization={1:1, 2:0, 3:0, 4:0})
ppr2 = nx.pagerank(G,personalization={1:0, 2:1, 3:0, 4:0})
Joel
  • 22,598
  • 6
  • 69
  • 93
  • Joel and original poster, the transition to a target node can contain different transitions on the path. Does this personalization mean that the transitions are always to the target node or does it mean that it will transition to other nodes first and eventually halts at target node? – Mina Nov 01 '21 at 21:12
  • Hi - I'm not sure what your question is asking. So let me clarify the algorithm. An individual is randomly following links through webpages. Every once in a while, he/she doesn't follow a link, but instead jumps to a page with probabilities based on the personalization weights. The surfer never halts. The algorithm produces the probabilities of being at each page after many, many steps. – Joel Nov 02 '21 at 23:32