16

I am working on building a transition matrix for implementing the PageRank algorithm. How could I use numpy to make sure that the columns add up to one.

For example:

1 1 1   
1 1 1  
1 1 1

should be normalized to be

.33 .33 .33  
.33 .33 .33  
.33 .33 .33
Simon
  • 591
  • 1
  • 7
  • 17

2 Answers2

29

Divide the elements of each column by their column-summations -

a/a.sum(axis=0,keepdims=1) # or simply : a/a.sum(0)

For making the row-summations unity, change the axis input -

a/a.sum(axis=1,keepdims=1)

Sample run -

In [78]: a = np.random.rand(4,5)

In [79]: a
Out[79]: 
array([[ 0.37,  0.74,  0.36,  0.41,  0.44],
       [ 0.51,  0.86,  0.91,  0.03,  0.76],
       [ 0.56,  0.46,  0.01,  0.86,  0.38],
       [ 0.72,  0.66,  0.56,  0.84,  0.69]])

In [80]: b = a/a.sum(axis=0,keepdims=1)

In [81]: b.sum(0) # Verify
Out[81]: array([ 1.,  1.,  1.,  1.,  1.])

To make sure it works on int arrays as well for Python 2.x, use from __future__ import division or use np.true_divide.


For columns adding upto 0

For columns that add upto 0, assuming that we are okay with keeping them as they are, we can set the summations to 1, rather than divide by 0, like so -

sums = a.sum(axis=0,keepdims=1); 
sums[sums==0] = 1
out = a/sums
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Fails for columns with sum zero. – grofte Apr 12 '18 at 08:16
  • @grofte As the question states we need to make the output such that the sum for each col adds upto 1. So, for columns that adds upto exactly `0`, I don't see how we can convert them to add upto `1`. – Divakar Apr 12 '18 at 08:41
  • But the OP is making a PageRank implementation. You can have nodes that don't point to anything. When this happens then they need 1/n in each value of the column. – grofte Apr 12 '18 at 09:16
  • @grofte So, are you saying that columns that add upto `0`, should stay as such. i.e. add upto `0` and not worry about adding upto `1`?. Think OP should have qualified on that. – Divakar Apr 12 '18 at 09:18
  • OK. Here's an article about PageRank: https://www.rose-hulman.edu/~bryan/googleFinalVersionFixed.pdf – grofte Apr 12 '18 at 09:25
  • @grofte Seems like an informative piece, though I don't have time to read. Made edits in this post to take care of that case and not change column elements. – Divakar Apr 12 '18 at 09:32
-2
for i in range(len(A[0])):
    col_sum = A[:, i].sum()
    if col_sum != 0:
        A[:, i] = A[:, i]/col_sum
    else: 
        pass

The for loop is a bit sloppy and I'm sure there's a much more elegant way but it works.
Replace pass with A[:, i] = 1/len(A[0]) to eliminate dangling nodes and make the matrix column stocastic.

grofte
  • 1,839
  • 1
  • 16
  • 15