I was working on drafting/testing a technique I devised for solving differential equations for speed and efficiency.
It would require a storing, manipulating, resizing, and (at some point) probably diagonalizing very large sparse matrices. I would like to be able to have rows consisting of zeros and a few (say <5) ones, and add them a few at a time (on the order of the number of cpus being used.)
I thought it would be useful to have gpu accelleration--so any suggestions as to the best way to take advange of that would be appreciated too (say pycuda, theano, etc.)