Environment : STM32H7 and GCC
Working with a flow of data : 1 sample received from SPI every 250 us
I do a "triangle" weighted moving average with 256 samples, like this but middle sample is weighted 1 and it forms a triangle around it
My samples are stored in uint32_t val[256]
circular buffer, it works with a uint8_t write_index
The samples are 24 bits, the max value of a sample is 0x00FFFFFF
uint8_t write_idx =0;
uint32_t val[256];
float coef[256];
void init(void)
{
uint8_t counter=0;
// I calculate my triangle coefs
for(uint16_t c=0;c<256;c++)
{
coef[c]=(c>127)?--counter:++counter;
coef[c]/=128;
}
}
void ACQ_Complete(void)
{
uint32_t moy=0;
// write_idx is meant to wrap
val[write_idx++]= new_sample;
// calc moving average (uint8_t)(c-write_idx) is meant to wrap
for(uint16_t c=0;c<256;c++)
moy += (uint32_t)(val[c]*coef[(uint8_t)(c-write_idx)]);
moy/=128;
}
I have to do the calcs during a 250 us time span, but I measured with a debug GPIO pin that the "moy" part takes 252 us
Code is simulated here
Interesting fact : If I remove the (uint32_t)
cast near the end it takes 274 us instead of 252 us
How can I get it done faster ?
I was thinking of using uint32
instead of float
for coef
(by multiply by 1000 for example) but my uint32
would overflow