Currently I'm implementing this paper for my undergraduate theses with python, but I only use the mahalanobis metric learning (in case you're curious).
In a shortcut, I face a problem when I need to learn a matrix with the size of 67K*67K consisting of integer, by simply numpy.dot(A.T,A)
where A is a random vector sized (1,67K). When I do that it's simply throw MemoryError since my PC only have 8gb ram, and the raw calculation of the memory needed is 16gb to init. Than I search for alternative and found dask.
so i moved on to dask with this dask.array.dot(A.T,A)
and it's done. But than I need to do whitening transformation to that matrix, and in dask I can achieve it by get the SVD. But everytime I do that SVD, the ipython kernel dies (I assume it due to lack of memory).
this is what I do so far from init, until the kernel dies:
fv_length=512*2*66
W = da.random.randint(10,20,(fv_length),(1000,1000))
W = da.reshape(W,(1,fv_length))
W_T = W.T
Wt = da.dot(W_T,W); del W,W_T
Wt = da.reshape(Wt,(fv_length*fv_length/2,2))
U,S,Vt = da.linalg.svd(Wt); del Wt
I didn't get the U,S,and Vt yet.
Is my memory simply not enough to do these sort of things, even when I'm using dask? or actually this is not a spec problem, but my bad memory management? or something else?
At this point I'm desperately trying in other bigger spec PC, so I am planning to rent a bare metal server with a 32gb ram. Even if I do so, is it enough?