3

I use extensively the julia's linear equation solver res = X\b. I have to use it millions of times in my program because of parameter variation. This was working ok because I was using small dimensions (up to 30). Now that I want to analyse bigger systems, up to 1000, the linear solver is no longer efficient.

I think there can be a work around. However I must say that sometimes my X matrix is dense, and sometimes is sparse, so I need something that works fine for both cases.

The b vector is a vector with all zeroes, except for one entry which is always 1 (actually it is always the last entry). Moreover, I don't need all the res vector, just the first entry of it.

user2820579
  • 3,261
  • 7
  • 30
  • 45
  • This likely isn't quite what you're looking for, but I've become quite enamored of the power of GPUs. If you've got access to that hardware, it could be quite useful for your situation here. – Michael Ohlrogge Jul 20 '16 at 17:38
  • You have to solve it millions of times. What about saving the LU factorization? This will work if you aren't changing X each iteration. Just do `X=lufact(X)` and then `X\b` should be faster. But this is if you're varying `b` (which happens in a lot of PDE solvers). – Chris Rackauckas Jul 20 '16 at 17:41
  • Perhaps it will work, but `X` does varies. It is written in the form `X = x*eye(N,N) - T`, where `T` is a fixed matrix. x is a parameter for which I have to solve the system `res = X\b`. – user2820579 Jul 20 '16 at 17:50
  • 1
    Subtracting out something proportional to the identity matrix would change factorizations in ways that are likely tough to predict, so I don't think factorizing will help here. [You may want to ask here](http://scicomp.stackexchange.com/) if anyone has a mathematical trick to make this problem faster (using a factorization of `x*eye(N,N) - T` to get a factorization with a different `x`). Someone may have a research article because of how related this may be to doing related eigenvalue computations, but I think doing this well would be more mathematically difficult than expected. – Chris Rackauckas Jul 20 '16 at 18:11
  • Otherwise, I think the only way to really handle it is to throw more cores at the problem, i.e. taking @MichaelOhlrogge 's suggestion to try using GPUs. If you haven't done it before, this could be a good place to use ArrayFire.jl. The syntax is very nice to make feel like it's not on the GPU, but you still get some good speedup if the matrices are big enough. 1000x1000 matrices may be too small for GPU speedups though. This is an oddly difficult problem (millions of linear solving problems on medium sized matrices, instead of a few solutions of a million x million matrix). – Chris Rackauckas Jul 20 '16 at 18:15
  • I would think though that you could potentially do, at least, 100x100 matrices on each GPU core, and potentially larger. I haven't experimented a lot with this, and for that, you'd need to write your own kernels, rather than using ArrayFire or a similar package. See, e.g. here (http://stackoverflow.com/questions/13371082/could-a-cuda-kernel-call-a-cublas-function) – Michael Ohlrogge Jul 20 '16 at 18:18
  • But the kernels would be very simple to write. It was the "doing something millions of times" in your description that made me think of farming out each one of those of the millions to a separate GPU core. – Michael Ohlrogge Jul 20 '16 at 18:26
  • 1
    If GPU is an option, you may want to try [cublasgetrfBatched()](http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-getrfbatched) and [cublasgetrsBatched()](http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-getrsbatched), which is optimized for solving a batch of small linear systems at a time. – kangshiyin Jul 20 '16 at 19:16
  • Maybe if you tell us what you actually want to do, we can suggest an alternative solution. Do you havenworking code that you can post? – David P. Sanders Jul 20 '16 at 23:22

3 Answers3

6

If your problem is of the form (A - µI)x = b, where µ is a variable parameter and A, b are fixed, you might work with diagonalization.

Let A = PDP° where denotes the inverse of P. Then (PDP° - µI)x = b can be transformed to

(D - µI)P°x = P°b, 
P°x = P°b / (D - µI), 
x = P(P°b / (D - µI)).

(the / operation denotes the division of the respective vector elements by the scalars Dr - µ.)

After you have diagonalized A, computing a solution for any µ reduces to two matrix/vector products, or a single one if you can also precompute P°b.

Numerical instability will show up in the vicinity of the Eigenvalues of A.

5

Usually when people talk about speeding up linear solvers res = X \ b, it’s for multiple bs. But since your b isn’t changing, and you just keep changing X, none of those tricks apply.

The only way to speed this up, from a mathematical perspective, seems to be to ensure that Julia is picking the fastest solver for X \ b, i.e., if you know X is positive-definite, use Cholesky, etc. Matlab’s flowcharts for how it picks the solver to use for X \ b, for dense and sparse X, are available—most likely Julia implements something close to these flowcharts too, but again, maybe you can find some way to simplify or shortcut it.

All programming-related speedups (multiple threads—while each individual solver is probably already multi-threaded, it may be worth running multiple solvers in parallel when each solver uses fewer threads than cores; @simd if you’re willing to dive into the solvers themselves; OpenCL/CUDA libraries; etc.) then can be applied.

Ahmed Fasih
  • 6,458
  • 7
  • 54
  • 95
  • As expected, I was wrong to close the door on mathematical speedups! See [@Yves Daoust’s answer](http://stackoverflow.com/a/38565259/500207). – Ahmed Fasih Jul 26 '16 at 00:32
1

Best approach for efficiency would be to use: JuliaMath/IterativeSolvers.jl. For A * x = b problems, I would recommend x = lsmr(A, b).

Second best alternatives would be to give a bit more information to the compiler: instead of x = inv(A'A) * A' * b, do x = inv(cholfact(A'A)) A' * b if Cholesky decomposition works for you. Otherwise, you could try U, S, Vt = svd(A) and x = Vt' * diagm(sqrt.(S)) * U' * b.

Unsure if x = pinv(A) * b is optimized, but might be slightly more efficient than x = A \ b.