0

My project are heavily using logsumexp in the algorithm. Currently I'm using this library https://github.com/rmcgibbo/logsumexp , which is implemented in SSE instruction set.

However, modern Intel CPU has much powerful AVX instruction sets. Hence, I would like to know if there's any faster logsumexp implementation by AVX or even CUDA for Python?

Thank you.

user2131907
  • 342
  • 1
  • 6
  • 14
  • Yes, there are a few SO Q&As with fast AVX implementations of log and exp, using C intrinsics. [Fastest Implementation of Exponential Function Using AVX](https://stackoverflow.com/q/48863719), [Efficient implementation of log2(\_\_m256d) in AVX2](https://stackoverflow.com/q/45770089) (also discusses single-precision `float`), and [Logarithm with SSE, or switch to FPU?](https://stackoverflow.com/a/8907932). You could replace some functions in `mathfun.h` from your library with AVX versions. – Peter Cordes Jul 03 '18 at 22:25
  • This is really helpful! Thanks a lot! – user2131907 Jul 04 '18 at 11:27

0 Answers0