10

I need open source (no restriction on license) implementation of log function, something with signature

__m128d _mm_log_pd(__m128d);

It is available in Intel Short Vector Math Library (part of ICC), but ICC is neither free nor open source. I am looking for implementation using intrinsics only.

It should use special rational function approximations. I need something almost as accurate as cmath log, say 9-10 decimal digits, but faster.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
watson1180
  • 2,015
  • 1
  • 18
  • 24
  • When asking for open source code, you usually need to specify the license for your project, so that people know whether you'll be able to use code under a certain license or not. – Cascabel Dec 13 '10 at 20:28
  • License doesn't matter. It is for in-hose project. All open source licenses are good for that. – watson1180 Dec 13 '10 at 20:33
  • @Jefromi: Conversely, the answers are more likely to be useful to other questioners in the future if they aren't too narrowly constructed. – caf Dec 13 '10 at 22:10
  • What's wrong with using the FPU's log instructions? They are at least double-precision. – PhiS Dec 18 '10 at 15:21
  • @PhiS: you can implement a faster but less accurate log (or whatever) yourself. After profiling, it is sometimes the right thing to do. – Alexandre C. Dec 27 '10 at 12:01
  • Related: **[AVX2 version of the same question](https://stackoverflow.com/questions/45770089/efficient-implementation-of-log2-m256d-in-avx2)**, with an answer that explains a lot about how to implement your own, and another answer that has a working implementation. – Peter Cordes Aug 27 '17 at 02:42

5 Answers5

6

I believe log2 is easier to compute. You can multiply/divide your number by a power of two (very quick) such that it lies in (0.5, 2], and then you use a Pade approximant (take M close to N) which is easy to derive once and for all, and whose order you can chose according to your needs. You only need arithmetic operations that you can do with SSE intrinsics. Don't forget to add/remove a constant according to the above scaling factor.

If you want natural log, divide by log2(e), that you can compute once and for all.

It is not rare to see custom log functions in some specific projects. Standard library functions address the general case, but you need something more specific. I sincerely think it is not that hard to do it yourself.

Alexandre C.
  • 55,948
  • 11
  • 128
  • 197
5

Take a look at AMD LibM. It isn't open source, but free. AFAIK, it works on Intel CPUs. On the same web page you find a link to ACML, another free math lib from AMD. It has everything from AMD LibM + Matrix algos, FF and distributions.

I don't know any open source implementation of double precision vectorized math functions. I guess Intel and AMD libs are hand optimised by the CPU manufacturer and everyone uses them when speed is important. IIRC, there was an attempt to implement intrinsics for vectorized math functions in GCC. I don't how far they managed to get. Obviously, it isn't a trivial task.

pic11
  • 14,267
  • 21
  • 83
  • 119
1

Framewave project is Apache 2.0 licensed and aims to be the open source equivalent of Intel IPP. It has implementations that are close to what you are looking for. Check the fixed accuracy arithmetic functions in the documentation.

renick
  • 3,873
  • 2
  • 31
  • 40
1

Here's the counterpart for __m256d: https://stackoverflow.com/a/45898937/1915854 . It should be pretty trivial to cut it to __m128d. Let me know if you encounter any problems with this.

Or you can view my implementation as something obtaining two __m128d numbers at once.

Serge Rogatch
  • 13,865
  • 7
  • 86
  • 158
0

If you cannot find an existing open source implementation it is relatively easy to create your own using the standard method of a Taylor series. See Wikipedia for this and a variety of other methods.

uesp
  • 6,194
  • 20
  • 15
  • I believe a fully accurate implementation requires multiple precision arithmetic. – caf Dec 14 '10 at 00:05
  • 6
    Taylor series is not a proper way to do it. One should use special rational function approximations. I need something almost as accurate as cmath log, but faster. Otherwise I could simply dispatch everything to cmath log. ICC implementation is accurate and fast. I need something similar, but open source. – watson1180 Dec 14 '10 at 02:17
  • @watson1180 apparently rational function approximation is slower than Taylor series methods on modern hardware – David Heffernan Dec 28 '10 at 11:03