I need a AVX512 double pow(double, int n)
function (I need it for a binomial distribution calculation which needs to be exact). In particular I would like this for Knights Landing which has AVX512ER. One way to get this is
x^n = exp2(log2(x)*n)
Knights Corner has the vlog2ps
instruction (_mm512_log2_ps
intrinsic) and the vexp223ps
instruction (_mm512_exp223_ps intrinsic
) so at least I could do float pow(float, float)
with those two instructions.
However, with Knights Landing I don't find a log2
instruction. I do find a vexp2pd
instruction (_mm512_exp2a23_pd
intrinsic) in AVX512ER. I find it strange that Knights Corner has a log2
instruction but Knights Landing which is newer and better does not.
For now I have implemented pow(double, n)
using repeated squaring but I think it would be more efficient if I had a log2
instruction.
//AVX2 but easy to convert to AVX512 with mask registers
static __m256d pown_AVX2(__m256d base, __m256i exp) {
__m256d result = _mm256_set1_pd(1.0);
int mask = _mm256_testz_si256(exp, exp);
__m256i onei = _mm256_set1_epi64x(1);
__m256d onef = _mm256_set1_pd(1.0);
while(!mask) {
__m256i t1 = _mm256_and_si256(exp, onei);
__m256i t2 = _mm256_cmpeq_epi64(t1, _mm256_setzero_si256());
__m256d t3 = _mm256_blendv_pd(base, onef, _mm256_castsi256_pd(t2));
result = _mm256_mul_pd(result, t3);
exp = _mm256_srli_epi64(exp, 1);
base = _mm256_mul_pd(base,base);
mask = _mm256_testz_si256(exp, exp);
}
return result;
}
Is there a more efficient algorithm to get double pow(double, int n)
with AVX512 and AVX512ER than repeated squaring? Is there an easy method (e.g. with a few instructions) to get log2
?
Here is the AVX512F version using repeated squaring
static __m512d pown_AVX512(__m512d base, __m512i pexp) {
__m512d result = _mm512_set1_pd(1.0);
__m512i onei = _mm512_set1_epi32(1);
__mmask8 mask;
do {
__m512i t1 = _mm512_and_epi32(pexp, onei);
__mmask8 mask2 = _mm512_cmp_epi32_mask(onei, t1, 0);
result = _mm512_mask_mul_pd(result, mask2, result, base);
pexp = _mm512_srli_epi32(pexp, 1);
base = _mm512_mul_pd(base,base);
mask = _mm512_test_epi32_mask(pexp, pexp);
} while(mask);
return result;
}
The exponents are int32 not int64. Ideally I would use __m256i
for the eight integers. However, this requires AVX512VL which extends the 512b operations to 256b and 128b but KNL does not have AVX512VL. Instead I use the 512b operations on 32-bit integers and I cast the 16b mask to 8b.