1

I have written the following code to convert a matrix into a stochastic and irreducible matrix. I have followed a paper (Deeper Inside PageRank) to write this code. This code works well for the square matrix but giving an error for rectangular matrices. How can I modify it to convert rectangular matrices into stochastic and irreducible matrices?

My Code:

 import numpy as np
 P = np.array([[0, 1/2, 1/2, 0, 0, 0], [0, 0, 0, 0, 0, 0], [1/3, 1/3, 0, 0, 1/3, 0], [0, 0, 0, 0, 1/2, 1/2], [0, 0, 0, 1/2, 0, 1/2]])
 #P is the original matrix containing 0 rows

 col_len = len(P[0])
 row_len = len(P)

 eT = np.ones(shape=(1, col_len))  # Row vector of ones to replace row of zeros
 e = eT.transpose()  # it is a column vector e
 eT_n = np.array(eT / col_len) # obtained by dividing row vector of ones by order of matrix

 Rsum = 0
 for i in range(row_len):
     for j in range(col_len):
         Rsum = Rsum + P[i][j]
 if Rsum == 0:
     P[i] = eT_n
 Rsum = 0
 P_bar = P.astype(float) #P_bar is stochastic matrix obtained by replacing row of ones by eT_n in P
 alpha = 0.85

 P_dbar = alpha * P_bar + (1 - alpha) * e * (eT_n) #P_dbar is the irreducible matrix
 print("The stocastic and irreducible matrix P_dbar is:\n", P_dbar)

Expected output:

A rectangular stochastic and irreducible matrix.

Actual output:

Traceback (most recent call last):
  File "C:/Users/admin/PycharmProjects/Recommender/StochasticMatrix_11Aug19_BSK_v3.py", line 13, in <module>
P_dbar = alpha * P_bar + (1 - alpha) * e * (eT_n) #P_dbar is the irreducible matrix
ValueError: operands could not be broadcast together with shapes (5,6) (6,6)
rpanai
  • 12,515
  • 2
  • 42
  • 64
BiSarfraz
  • 459
  • 1
  • 3
  • 14
  • 1
    Why have you tagged with both python-2.7 and python-3.x? Which version are you really using here? Your code runs without error for me using python 3.7 – Håken Lid Oct 14 '19 at 10:19
  • Where are you using the `P` array? You define it in line 2, but it's never used again. Questions seeking help debugging must include a [mcve]. – Håken Lid Oct 14 '19 at 10:24
  • I am using Python 3.7. – BiSarfraz Oct 14 '19 at 11:23
  • @HåkenLid it is working for square matrices, but if I change the input and run it for a rectangular matrix then it generates an error. I have given input above on which it is giving an error. – BiSarfraz Oct 14 '19 at 11:31
  • I have edited code. P_bar is obtained by replacing the row of zeros by eT_n in P. – BiSarfraz Oct 14 '19 at 11:45

1 Answers1

1

You are trying to multiply two arrays of different shapes. That will not work, since one array has 30 elements, and the other has 36 elements.

You have to make sure the array e * eT_n has the same shape as your input array P.

You are not using the row_len value. But if e has the correct number of rows, your code will run.

# e = eT.transpose()  # this will only work when the input array is square
e = np.ones(shape=(row_len, 1))  # this also works with a rectangular P 

You can check that the shape is correct:

(e * eT_n).shape == P.shape 

You should study the numpy documentation and tutorials to learn how to use the ndarray data structure. It's very powerful, but also quite different from the native python data types.

For example, you can replace this verbose and very slow nested python loop with a vectorized array operations.

Original code (with fixed indentation):

for i in range(row_len):
    Rsum = 0
    for j in range(col_len):
        Rsum = Rsum + P[i][j]
    if Rsum == 0:
        P[i] = eT_n

Idiomatic numpy code:

P[P.sum(axis=1) == 0] = eT_n

Furthermore, you don't need to create the array eT_n. Since it's just a single value repeated, you can assign the scalar 1/6 directly instead.

# eT = np.ones(shape=(1, col_len))  
# eT_n = np.array(eT / col_len)

P[P.sum(axis=1) == 0] = 1 / P.shape[1]
Håken Lid
  • 22,318
  • 9
  • 52
  • 67
  • Dear Håken Lid, thank you so much for insightful help. But if I am using the following input, the resultant matrix is not stochastic. P = np.array([[1, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 1, 1, 0]]) – BiSarfraz Oct 14 '19 at 13:42
  • This is also giving wrong output for the following inputs: P = np.array([[0, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 1, 1, 0]]) and P = np.array([[0, 1, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 1]]) – BiSarfraz Oct 14 '19 at 13:51
  • Also giving wrong output for P = np.array([[0, 1/2], [0, 0], [1/3, 1/3], [0, 0]]) and P = np.array([[0, 1/2, 1/2, 0, 0, 0], [0, 0, 0, 0, 1/2, 0], [1/3, 1/3, 0, 0, 1/3, 0], [0, 0, 0, 0, 1/2, 0]]) I am using following code to check.. print("check \n", P_dbar.sum(axis=1)) – BiSarfraz Oct 14 '19 at 13:59
  • I can tell you what is wrong with your python code, but I have no idea what output you want from those inputs, and I also don't really know if a non-square matrix could be called a stochastic matrix. Maybe that question is better suited at https://datascience.stackexchange.com/ ? – Håken Lid Oct 14 '19 at 16:49