2

I'm using the below gradient descent implementation in Octave for ML.

I tried first to increase number of CPU cores and run Octave multithreaded using OpenBlas but still I didn't get the results I'm looking for, so I tried using Nvidia's toolkit and their Tesla K80 GPU

I'm loading Octave using the drop in nvblas following instructions in this article:

Drop-in Acceleration of GNU Octave

When I checked nvidia-smi I found the GPU to be idle although my testing using a matrix matrix multiplication is yielding ~9 teraflops

Later I came to understand that the matrix vector multiplication used for the above mentioned implementation is not supported as per the nvblas documentation

So my question is there is a gradient descent implementation that uses matrix matrix multiplication or something equivalent that can replace the gradient descent implementation I have?

Andreas Rossberg
  • 34,518
  • 3
  • 61
  • 72
Fady Anwar
  • 29
  • 5
  • what is you goal? The ML course and the provided scripts are intended to learn how gradient descent works, not to provide a fast implementation. If speed is you goal you should stick to faster algorithms (perhaps which doesn't get stuck in local extremas). Some of them are implemented in Fortran or C++. If the result then is still not sufficient you can try to port them to use your cuda GPU... If your goal is to learn how to use CUDA, then first learn the basics and then write a wrapper so you can use it from Matlab and/or Octave (for example using the MEX interface) – Andy Apr 15 '17 at 08:06
  • Thanks @Andy for your comment, my goal is speed up the execution of the code I had wrote based on gradient descent, it will be hard for me to switch technologies or change algorithms at this point of time. – Fady Anwar Apr 17 '17 at 09:26
  • If speed is your goal you really, really should have a look at the other optim algorithms in GNU Octave like fminunc (which is gradient search in core), nelder-mead simplex, bfgs, simmulated annealing .. which are part of octave-forge optim package. – Andy Apr 17 '17 at 16:47
  • I was actually checking fminunc but I wanted to know in advance if it will use nvblas matrix matrix multiplication to capitalize on the GPU enhanced performance. – Fady Anwar Apr 18 '17 at 13:03

0 Answers0