Huge matrix multiplication can run for hours and consume a lot of memory. There are already some questions on Stackoverflow asking how to reduce the memory consumption, for instance [here]Python NUMPY HUGE Matrices multiplication.
One suggestion is to perform the multiplication row by row, so that only two rows are loaded at the same time which I find pretty straightforward, although very slow. Another suggestion is to use sparse representations and that is where my mathematical knowledge reaches its limits.
Although this is a big topic, I haven't found some simple examples of implementations yet. So I ask this question on the basis of a REPREX. It will probably help some people in the future. In this example, I have a vector and a matrix, which I would like to multiply. After that, I build the sum of every row in the resulting matrix. My example looks like this:
myV = [6.29586100e-05, 5.04149100e-04, 1.44845100e-05]
myM = [[1,0,0],[0,0,1],[0,0,1],[0,0,0],[0,1,1]]
result = myfun(myV, myM)
Whereby myfun
is defined as follows:
def myfun(V, M):
results=[]
#convert lists into matrices
npV = np.matrix(V)
npM = np.matrix(M)
#mutliply matrices
results = np.multiply(npV, npM)
#print for understanding
print('npV');print(npV)
print('npM'); print(npM)
print('result');print(result)
#sum probabilities of morph given every word
sum_results = np.sum(results,axis=1).tolist()
print('sum_results'); print(sum_results)
#convert numpy array back to list
list_results = [value for sublist in sum_results for value in sublist]
return(list_results)
For illustration, I printed what happens:
npV
[[ 6.29586100e-05 5.04149100e-04 1.44845100e-05]]
npM
[[1 0 0]
[0 0 1]
[0 0 1]
[0 0 0]
[0 1 1]]
results
[[ 6.29586100e-05 0.00000000e+00 0.00000000e+00]
[ 0.00000000e+00 0.00000000e+00 1.44845100e-05]
[ 0.00000000e+00 0.00000000e+00 1.44845100e-05]
[ 0.00000000e+00 0.00000000e+00 0.00000000e+00]
[ 0.00000000e+00 5.04149100e-04 1.44845100e-05]]
sum_results
[[6.295861e-05], [1.448451e-05], [1.448451e-05], [0.0], [0.00051863361]]
However, if this simple calculation is applied on a huge vector and matrix, it consumes a lot of memory. How could the code be improved in order to consume less memory without increasing the run time too drastically?