I had this issue for a neural network I wanted to implement on a Raspberry Pi 3 (weights between -127 and 127), and the fastest method I found was a binary search implemented as nested if
statements; obviously the if
statements needed to be autogenerated and Python came to the rescue.
code
Given a C function:
static
uint16_t sigmoid_lookup(int32_t i) {
#include "autogen_sigmoid_index.i"
}
and a sorted Python list of (sigmoid_value, at_argument)
, this function creates the if-else
tree:
def produce_c_code(numbers_list, idxs, idxe, level):
if idxs >= idxe:
raise RuntimeError("idxs=%d idxe=%d")
indent= " "*level
if idxs + 1 == idxe: # end of recursion
yield indent + "return %d;" % numbers_list[idxs][0]
else:
idxm= (idxe+idxs)//2
yield indent + "if(i>=%d)" % numbers_list[idxm][1]
yield from produce_c_code(numbers_list, idxm, idxe, level+1)
yield indent + "else"
yield from produce_c_code(numbers_list, idxs, idxm, level+1)
example
For this number list: [(0, 0), (1, 9), (2, 25), (3, 41), (4, 57), (5, 73), (6, 89)]
, the code produced is:
if(i>=41)
if(i>=73)
if(i>=89)
return 6;
else
return 5;
else
if(i>=57)
return 4;
else
return 3;
else
if(i>=9)
if(i>=25)
return 2;
else
return 1;
else
return 0;
benchmarks
The benchmarks are based on the 127 * n / sqrt(n*n + 4194304)
sigmoid function of my case, and they are over the input range [-8000000, 8000000].
Pentium M 1.2 GHz
testing lookup
CPU time: 300000
testing math
CPU time: 1460000
Raspberry Pi 2 800 MHz
testing lookup
CPU time: 474094
testing math
CPU time: 2897385
Raspberry Pi 3 1.2GHz
testing lookup
CPU time: 369665
testing math
CPU time: 1570066
Intel Core™2 Q6600 2.4 GHz
testing lookup
CPU time: 73623
testing math
CPU time: 797847