Let
| 0 0 0 1 0 |
| 0 0 0 1 0 |
| 0 0 0 1 1 |
| 1 1 1 0 0 |
| 0 0 1 0 0 |
be an adjacency matrix (or a graph). Then transition matrix M
in PageRank will be
| 0 0 0 1/3 0 |
| 0 0 0 1/3 0 |
| 0 0 0 1/3 1 |
| 1 1 1/2 0 0 |
| 0 0 1/2 0 0 |
which is column stochastic, irreducible, and aperiodic.
MapReduce starts from here. Serialized input for mappers will be like
1 -> 4
2 -> 4
3 -> 4 , 5
4 -> 1 , 2 , 3
5 -> 3
and mappers will emit the followings:
< 1 , [4] >
< 4 , 1 >
< 2 , [4] >
< 4 , 1 >
< 3 , [4 , 5] >
< 4 , 1/2 >
< 5 , 1/2 >
< 4 , [1, 2, 3] >
< 1 , 1/3 >
< 2 , 1/3 >
< 3 , 1/3 >
< 5 , [3] >
< 3 , 1 >
Mapper outputs will grouped by key and taken by reducers. If we have 5 reducers it will be like:
R1 takes [4] , 1/3 then computes 1/5*(1/3) = 2/30
R2 takes [4] , 1/3 then computes 1/5*(1/3) = 2/30
R3 takes [4, 5] , 1/3 , 1 then computes 1/5*(1/3 + 1) = 8/30
R4 takes [1, 2, 3] , 1 , 1 , 1/2 then computes 1/5*( 1 + 1 + 1/2) = 15/30
R5 takes [3] , 1/2 then computes 1/5*(1/2) = 3/30
Now the first (power) iteration is over. During the following reduce jobs, reducers will emit like what mappers do, however, PR computed will be used instead of 1:
< 1 , [4] >
< 4 , 2/30 >
< 2 , [4] >
< 4 , 2/30 >
< 3 , [4 , 5] >
< 4 , 4/30 >
< 5 , 4/30 >
< 4 , [1, 2, 3] >
< 1 , 5/30 >
< 2 , 5/30 >
< 3 , 5/30 >
< 5 , [3] >
< 3 , 3/30 >
Repeat reduce jobs until it converges enough or you are satisfied.