Any Faster RMS Value Calculation in C?

Question

I am writing a software for a small 8-bit microcontroller in C. Part of the code is to read the ADC value of a current transformer (ZCT), and then calculate the RMS value. The current flowing through the ZCT is sinusoidal but it can be distorted. My code as follow:

float       adc_value, inst_current;
float       acc_load_current;           // accumulator = (I1*I1 + I2*I2 + ... + In*In)
double      rms_current;

// Calculate the real instantanous value from the ADC reading
inst_current = (adc_value/1024)*2.5;    // 10bit ADC, Voltage ref. 2.5V, so formula is: x=(adc/1024)*2.5V                           

// Update the RMS value with the new instananous value:
// Substract 1 sample from the accumulator (sample size is 512, so divide accumulator by 512 and substract it from the accumulator)
acc_load_current -= (acc_load_current / 512);       
inst_current *= inst_current;          // square the instantanous current
acc_load_current += inst_current;      // Add it to the accumulator

rms_current = (acc_load_current / 512);  // Get the mean square value. (sample size is 512)
rms_current = sqrt(rms_current);         // Get RMS value

// Now the < rms_current > is the real RMS current

However, it has many floating point calculations. This adds a large burden to my small MCU. And I found that the sqrt() function does not work in my compiler.

Is there any code that could run faster?

How accurate does it need to be (can you just find the peak and do `RMS = peak_adc_value/1024*2.5*0.707`)? — Brendan, Mar 02 '15 at 10:39
Since you appear to use a sliding window average (which does not change a lot between samples), you could implement the sqrt() by a one-step Newton-Ralfston, using the old sqrt as a starting value. — joop, Mar 02 '15 at 10:45
Generally a look-up table is used to store the square root instead of calculating it.(In low resource environment for faster access) — Vagish, Mar 02 '15 at 10:50
@Vagish A lookup table implies binary search, which is O(log N). Newton-Ralfston converges in 4or5 iterations (but has a larger big O) — joop, Mar 02 '15 at 11:01
@joop,sorry I meant look up table means simple array stored in ROM. — Vagish, Mar 02 '15 at 11:20
What MCU exactly are you using? Does it sipport floating point in hardware? If not, you should probably switch this over to fixed point. Many will have hardware level instructions designed to make this sort of computation fast. — Degustaf, Mar 02 '15 at 14:55
My tired head spent a long while figuring out how would one go about calculating the value of [RMS](http://en.wikipedia.org/wiki/Richard_Matthew_Stallman)... — , Mar 03 '15 at 10:06
Apart from the good algorithmic approaches in the answers (Newton Raphson and fixed point, note that of course you can do N-R in fixed point, too), you should also take a look at your compiler's output. Depending on the compiler and the instruction set of the MCU, you may find hairy things in there which can be straightened out by appropriately tailored C code with appropriately chosen types (especially in fixed point). For performance usually the "roll your own" principle works well for math. — Jubatian, Mar 10 '15 at 15:01

Edgar Bonet · Accepted Answer · 2015-03-03T08:57:39.313

9

When you need to get faster on an processor that lacks an FPU, your best bet is to replace floating point calculations with fixed point. Combine this with joop's suggestion (a one Newton-Raphson sqrt) and you get something like this:

#define INITIAL 512  /* Initial value of the filter memory. */
#define SAMPLES 512

uint16_t rms_filter(uint16_t sample)
{
    static uint16_t rms = INITIAL;
    static uint32_t sum_squares = 1UL * SAMPLES * INITIAL * INITIAL;

    sum_squares -= sum_squares / SAMPLES;
    sum_squares += (uint32_t) sample * sample;
    if (rms == 0) rms = 1;    /* do not divide by zero */
    rms = (rms + sum_squares / SAMPLES / rms) / 2;
    return rms;
}

Just run your raw ADC samples through this filter. You may add a few bit-shifts here and there to get more resolution, but you have to be careful not to overflow your variables. And I doubt you really need the extra resolution.

The output of the filter is in the same unit as its input. In this case, it is the unit of your ADC: 2.5 V / 1024 ≈ 2.44 mV. If you can keep this unit in subsequent calculations, you will save cycles by avoiding unnecessary conversions. If you really need the value to be in volts (it may be an I/O requirement), then you will have to convert to floating point. If you want millivolts, you can stay in the integer realm:

uint16_t rms_in_mV = rms_filter(raw_sample) * 160000UL >> 16;

edited Mar 03 '15 at 08:57

answered Mar 02 '15 at 14:32

Edgar Bonet

3,416
15
18

Thanks for reply. I will study the fixed point calculation. I suppose if I input my raw ADC values to this filter, the return of this filter is the RMS value of the ADC data, right? Therefore I still need to calculate the real RMS current/voltage value from the RMS ADC result by fixed point method (e.g. rms_voltage = (rms / 1024) * 2.5V), is it true? – eepty Mar 03 '15 at 01:57
1

@eepty: you suppose right (answer updated). You need not calculate the real RMS voltage because the output of the filter _is the real voltage_, only in a non conventional unit. Whether you need to change units, I cannot say, it all depends on the requirements of your application: if you want to display the result, you will want a more conventional unit. If it is some sort of process control, you can keep whatever unit is convenient for the calculations. – Edgar Bonet Mar 03 '15 at 08:58
@EdgarBonet Why sum_squares -= sum_squares / SAMPLES;? I don't understand how this is subtracting one sample, I mean mathematically proved. It's actually subtracting the average sum, isn't it? – judoka_acl May 09 '16 at 11:00
1

@lalamer: It's an [exponential moving average](https://en.wikipedia.org/wiki/Exponential_moving_average) scaled by a factor `SAMPLES` and with a time constant of `SAMPLES` sampling times. With the notations of the linked Wikipedia article: `S_t = Y_t + (1−α)⋅S_{t−1}`, with `α = 1/SAMPLES`. – Edgar Bonet May 09 '16 at 15:03
@EdgarBonet can you take a look at related question please? https://stackoverflow.com/questions/61698417/rms-calculation-dc-offset – NStorm May 11 '20 at 07:31

joop · Answer 2 · 2015-03-02T11:52:21.063

Since your sum-of-squares value acc_load_current does not vary very much between iterations, its square root will be almost constant. A Newton-Raphson sqrt() function normally converges in only a few iterations. By using one iteration per step, the computation is smeared out.

static double one_step_newton_raphson_sqrt(double val, double hint)
{
double probe;
if (hint <= 0) return val /2;
probe = val / hint;
return (probe+hint) /2;
}

static double      acc_load_current = 0.0;           // accumulator = (I1*I1 + I2*I2 + ... + In*In)
static double      rms_current = 1.0;
float       adc_value, inst_current;
double      tmp_rms_current;

// Calculate the real instantanous value from the ADC reading
inst_current = (adc_value/1024)*2.5;    // 10bit ADC, Voltage ref. 2.5V, so formula is: x=(adc/1024)*2.5V                           

// Update the RMS value with the new instananous value:
// Substract 1 sample from the accumulator (sample size is 512, so divide accumulator by 512 and substract it from the accumulator)
acc_load_current -= (acc_load_current / 512);
inst_current *= inst_current;          // square the instantanous current
acc_load_current += inst_current;      // Add it to the accumulator

tmp_rms_current = (acc_load_current / 512);
rms_current = one_step_newton_raphson_sqrt(tmp_rms_current, rms_current);         // Polish RMS value

// Now the <rms_current> is the APPROXIMATE RMS current

Notes:

I changed some of the data types from float to double (which is normal on a general purpose machine/desktop) If double is very expensive on your microcomputer you could change them back.
I also added static, because I did not know if the code was from a function or from a loop.
I made the function static to force it to be inlined. If the compiler does not inline static functions, you should inline it manually.

Ooops `ph`, I guess ... (I'm Dutch, we don't have ph; most of the words use `f` ) — joop, Mar 02 '15 at 11:23
OK, but the guy's name was actually Raphson (no S, no T). ;) — j_random_hacker, Mar 02 '15 at 11:26
I was too lazy to look it up. Now I still have to look it up. Consider it done. — joop, Mar 02 '15 at 11:29

score 0 · Answer 3 · answered Mar 02 '15 at 10:39

Hopefully your project is for measuring "Big" AC voltages ( and not something like 9v motor control. ) If this happens to be the case then you can cheat because your error can then be within accepted limits..

Do all of the maths in integer, and use a simple lookup map for the sqrt operation. ( which you can pre-calculate at startup, you would only REALLY need about ~600 odd values if you are doing 3 phase..

Also this begs the question do you ACTUALLY need VAC RMS or some other measure of power ? (for example can you get away with a simple box mean algorythm? )

If the input voltages are small, just do fixed-point arithmetic by shifting the "binary point" a few places to the right (i.e. shifting your inputs left a few places), then shifting it back again at the end (remembering to adjust after multiplications and divisions). — j_random_hacker, Mar 02 '15 at 11:22

Spektre · Answer 4 · 2017-05-30T10:51:09.887

divisions/multiplications by power of 2

can be done by changing the exponent only via bit mask operations and +,- so mask/extract the exponent to integer value then apply bias. After that add/sub the value log2(operand) and encode back to your double value
sqrt

how fast and accurate it should be? You can use binary search on fixed point or use sqrt(x)=pow(x,0.5)=exp2(0.5*log2(x)). Again on fixed point it is quite easy to implement. You can temporarily make double a fixed point by taking the mantissa and bit shift it to some known exponent around your used values + handle the offset or to 2^0 if you have enough bits ...

compute sqrt and then store back to double. If you do not need too big precision then you can stay on operand exponent and do the binary search directly on mantissa only.

[edit1] binary search in C++

//---------------------------------------------------------------------------
double f64_sqrt(double x)
    {
    const int h=1;      // may be platform dependent MSB/LSB order
    const int l=0;
    DWORD b;            // bit mask
    int e;              // exponent
    union               // semi result
        {
        double f;       // 64bit floating point
        DWORD u[2];     // 2x32 bit uint
        } y;
    // fabs
    y.f=x;
    y.u[h]&=0x7FFFFFFF; x=y.f;
    // set safe exponent (~ abs half)
    e=((y.u[h]>>20)&0x07FF)-1023;
    e/=2;               // here can use bit shift with copying MSB ...
    y.u[h]=((e+1023)&0x07FF)<<20;
    // mantisa=0
    y.u[l] =0x00000000;
    y.u[h]&=0xFFF00000;
    // correct exponent
    if (y.f*y.f>x) { e--; y.u[h]=((e+1023)&0x07FF)<<20; }
    // binary search
    for (b =0x00080000;b;b>>=1) { y.u[h]|=b; if (y.f*y.f>x) y.u[h]^=b; }
    for (b =0x80000000;b;b>>=1) { y.u[l]|=b; if (y.f*y.f>x) y.u[l]^=b; }
    return y.f;
    }
//---------------------------------------------------------------------------

it returns sqrt(abs(x)) the results match "math.h" implementation from mine C++ IDE (BDS2006 Turbo C++). Exponent starts at half of the x value and is corrected by 1 for values x>1.0 if needed. The rest is pretty obvious

Code is in C++ but it is still not optimized it can be surely improved ... If your platform does not know DWORD use unsigned int instead. If your platform does not support 32 bit integer types then chunk it to 4 x 16 bit values or 8 x 8 bit values. If you have 64 bit then use single value to speed up the process

Do not forget to handle exponent also as 11 bit .... so for 8 bit registers only use 2 ... The FPU operations can be avoided if you multiply and compare just mantissas as integers. Also the multiplication itself is cumulative so you can use previous sub-result.

[notes]

For the bit positions see wiki double precision floating point format

To divide by 512, provided IEEE-732 FP representation is being used, I'm pretty sure you could just treat the in-memory value as an integer and subtract a suitably-shifted constant directly from it. — j_random_hacker, Mar 02 '15 at 11:25
pow() will probably be more expensive than sqrt(); (possibly exception: powers of two, but this depends on the compiler detetectem these special cases) — joop, Mar 02 '15 at 11:28
@joop pow can achieve more precision then fixed point binary search so without knowing the target needs I mentioned both, compiler has nothing to do with this because: 1. most MCU's do not have FPU module so everything is software computed via functions... 2. compiler has no way to detect special case of generic function operands (unless the sqrt is integrated in language directly which is not this case) — Spektre, Mar 02 '15 at 12:11
@j_random_hacker did not occur to me ... but it seems it should work (+1) for nice idea — Spektre, Mar 02 '15 at 12:13
@joop have tried to implement binary search and after playing with exponent a bit the precision match the pow so you were right pow is slower even for this case at the same precision — Spektre, Mar 02 '15 at 13:02
@Spektre: it would be nice if you can explain your code of finding square root with some nice explanations. In current form it is not readable because of scant comments and sprinkled constants. Can you explain "binary search of fixed point"? — newbie_old, Jun 22 '15 at 00:08
@newbie_old this is not fixed point this is floating point so 1. first compute the results exponent (it is half of operands for sqrt hence the e/=2, 1023 is bias of exponent for `double`) 2. handle mantissa bits as integer for binary search so set the bits from MSB to LSB and clear if result is bigger then x just like in the integer bin search example I give you in your question) there are 2 `for`s because `double` fits into 2 32bit integers (first one skips the exponent and sign bits that is why starts from `0x00080000`. The binary search uses FPU aritmetics `(y.f*y.f)` — Spektre, Jun 22 '15 at 07:35

score 0 · Answer 5 · answered Mar 13 '15 at 08:59

0

To find the square root use the below app note from microchip for 8 bit controllers

Fast Integer Square Root

which is much fast and can find square root in just 9 loops

answered Mar 13 '15 at 08:59

0xAB1E

721
10
27

Any Faster RMS Value Calculation in C?

5 Answers5

Linked