2

I'm programming an stm8s micro controller and I'm using STVD IDE and COSMIC compiler.

The result of a subtracting two uint32_t variables is saved in another uint32_t variable. Sometimes a weird value results from this process. This weird value is always the expected value with the most significant bits are set to 1s.

Here is a snippet of my code:

static uint32_t lastReceivedLed = 0;
uint32_t timeSinceLast = 0;

timeSinceLast = IL_TimTimeNow() - lastReceivedLed;

if(timeSinceLast > 2500U)
{
      Inhibitor = ACTIVE;  // HERE IS MY BREAKPOINT
}

Here is how IL_TimTimeNow() is defined:

volatile uint32_t IL_TimNow = 0;

uint32_t IL_TimTimeNow(void)
{
    return IL_TimNow; // Incremented in timer ISR
}

Here are some real values from a debugging session:

enter image description here

timeSinceLast should be 865280 - 865055 = 225 = 0xE1

However, the result calculated by the compiler is 4294967265 = 0xFFFFFFE1

Notice that the least significant byte is correct while the rest of the bytes are set to 1s in the compiler's result!!

Also notice that this situation only happens once in a while. Otherwise, it works perfectly as expected.

Is this an overflow? What can cause this situation?

glts
  • 21,808
  • 12
  • 73
  • 94
Salahuddin
  • 1,617
  • 3
  • 23
  • 37
  • are you sure there is a proper prototype of `IL_TimTimeNow()` in the source file where you call it? – Ingo Leonhardt Jun 13 '19 at 15:36
  • Where is your breakpoint set? If you stop the debugging at the wrong time, it's possible that the subtraction hasn't yet completed, and you're seeing an intermediate result from a 32-bit subtraction on an 8-bit processor. – Thomas Jager Jun 13 '19 at 15:37
  • @IngoLeonhardt Yes, I'm sure. I've included the header file which has the correct prototype. Thomas, I indicated in my code snippet where I put the breakpoint. – Salahuddin Jun 13 '19 at 15:41
  • @Salahuddin I'd suggest looking at the assembly generated. Optimization can change things, and C often does not line up cleanly with the generated assembly. – Thomas Jager Jun 13 '19 at 15:43
  • 1
    [Isnt it](https://stackoverflow.com/questions/20070374/how-do-i-set-a-uint32-to-its-maximum-value) `4294967265` in `uint32_t` == `0xFFFFFFFF` ? (and not `0xFFFFFFE1`) This info may change what to check when debugging this problem... or what is actually happening. – wendelbsilva Jun 13 '19 at 15:52
  • 2
    Since `IL_TimNow` keeps changing, its value shown in the debugger might not match the values that were subtracted. If you add another variable `uint32_t curTime;`, and replace `timeSinceLast = IL_TimTimeNow() - lastReceivedLed;` with `curTime = IL_TimTimeNow();` `timeSinceLast = curTime - lastReceivedLed;`, then you can see which values were actually subtracted. If it turns out that `curTime` is less than `lastReceivedLed`, that would explain the large value you are seeing after subtraction. – Ian Abbott Jun 13 '19 at 16:00
  • 1
    @wendelbsilva The number is `4294967265` not `4294967295` – Thomas Jager Jun 13 '19 at 16:08
  • 2
    Another thing is that while `IL_TimNow` is volatile, it's not atomically accessed. The value held in it could change in an ISR while it's being loaded to be returned from `IL_TimTimeNow`. – Thomas Jager Jun 13 '19 at 16:09
  • @IanAbbott This is a great point I never thought of. I think this is the root of this problem. I'm still investigating and I'll write once I'm sure if this is correct. Million thanks – Salahuddin Jun 13 '19 at 16:31
  • 1
    Yes, this is almost certainly a re-entrancy bug. Your ancient 8-bitter will struggle massively with 32 bit numbers. You must always protect variables shared with ISRs from race conditions. There exist almost no exceptions to this, no matter MCU. – Lundin Jun 13 '19 at 20:22

1 Answers1

7

The values shown in the debugger are:

  • IL_TimNow = 865280
  • lastReceivedLed = 865055
  • timeSinceLast = 4294967265

Note that 4294967265 is also what you get when you convert -31 to a uint32_t. This suggests that the value of IL_TimNow returned by IL_TimTimeNow() just before the subtraction was actually lastReceivedLed - 31, which is 865055 - 31, which is 865024.

The difference between the value of IL_TimNow shown in the debugger (865280), and the value of IL_TimNow just before the subtraction (865024), is 256. Moreover, the least-significant 8 bits of both values are all zero. This suggests that the value was being read just as the least-significant byte was wrapping round to 0 and the next byte was being incremented. The comment in IL_TimTimeNow() says // Incremented in timer ISR. Since the 8-bit microcontroller can only read one byte at a time, it seems that the timer ISR occurred while the four bytes of IL_TimNow were being read by the function.

There are two ways to solve the problem. The first way is to disable the timer interrupt in IL_TimTimeNow() while the value of IL_TimNow is being read. So the IL_TimTimeNow() function can be changed to something like this:

uint32_t IL_TimTimeNow(void)
{
    uint32_t curTime;

    disable_timer_interrupt();
    curTime = IL_TimNow;
    enable_timer_interrupt();
    return curTime;
}

However, you will need to check that disabling the timer interrupt temporarily only results in the interrupt being delayed, and not skipped altogether (otherwise you will lose timer ticks).

The other way to solve the problem is to keep reading IL_TimNow in IL_TimTimeNow() until you get two identical values. So the IL_TimTimeNow() function can be changed to something like this:

uint32_t IL_TimTimeNow(void)
{
    uint32_t prevTime, curTime;

    curTime = IL_TimNow;
    do
    {
         prevTime = curTime;
         curTime = IL_TimNow;
    } while (curTime != prevTime);
    return curTime;
}

There will usually be a single iteration of the do ... while loop, reading IL_TimNow twice. Occasionally, there will be two iterations of the loop, reading IL_TimNow three times. In practice, I wouldn't expect more than two iterations of the loop, but the function can handle that as well.

A less safe, but possibly slightly faster version of the above would be to only read IL_TimNow twice when the least-significant byte is 0:

uint32_t IL_TimTimeNow(void)
{
    uint32_t curTime;

    curTime = IL_TimNow;
    if ((curTime & 0xFF) == 0)
    {
        // Least significant byte possibly just wrapped to 0
        // so remaining bytes may be stale. Read it again to be sure.
        curTime = IL_TimNow;
    }
    return curTime;
}

If performance is not an issue, use one of the safer versions.

Ian Abbott
  • 15,083
  • 19
  • 33
  • "keep reading IL_TimNow in IL_TimTimeNow() until you get two identical values" This is completely unscientific and bad advise. The OP is not reading some physical input, they are reading a timer register. – Lundin Jun 13 '19 at 20:25
  • Also, it doesn't matter if it is a 8 bitter or 32 bitter, the former only exposes the bug quicker and makes it more frequent. When writing in C, there is never a guarantee that a 32 bit variable will be read in a single cycle, unless the compiler supports C11 `_Atomic`. You must always protect such variables from race conditions. Inline assembler is another option. – Lundin Jun 13 '19 at 20:26
  • @Lundin Could you please explain your first comment in more details? I don't understand what you mean. Also, if you have a faster and safer solution could you please share it? Thanks – Salahuddin Jun 14 '19 at 08:48
  • I think disabling and enabling interrupts is the right way to do it. The `do..while` way is not a good one becasue it can happen that the two values are corrupted. Yes, it is very unlikely to happen but it can. Also going in a `do..while()` without `timeout` is not a good idea in a real-time system. Maybe this is what @Lundin meant. – Salahuddin Jun 14 '19 at 11:05
  • @Salahuddin See [Using volatile in embedded C development](https://electronics.stackexchange.com/a/409570/6102). This also explains why you must declare the variable `volatile`. – Lundin Jun 14 '19 at 11:11
  • (Kind of funny how I constantly find myself fighting a holy war against the mediocre embedded programmer hordes over this bug, then forget it myself. This whole day I have been troubleshooting a timer that sporadically went over the expected time. The culprit was a `bool` variable shared with an ISR and not protected with semaphores. This was on a 32 bit ARM which has absolutely no problem reading a bool in a single cycle. I wrote a mutex in inline asm by slapping the interrupt flag during read, and now it behaves like a charm. One can't underestimate how subtle, common and severe this bug is) – Lundin Jun 14 '19 at 11:20
  • @Lundin "The OP is not reading some physical input, they are reading a timer register." No, OP is reading a volatile variable updated by a timer interrupt routine, according to the comments and variable declaration. The variable access is non-atomic, so could be updated while it is being read. The loop to check for two identical values is one way to filter out the partial updates for a single-core/single-threaded processor. I wouldn't expect more than 3 iterations unless there is an avalanche of interrupts going on. – Ian Abbott Jun 17 '19 at 17:23
  • @Salahuddin "it can happen that the two values are corrupted". Yes, but it's highly unlikely that they are both corrupted to the same value, since `IL_TimNow` is monotonically increasing (until it wraps around). It's highly unlikely that `IL_TimNow` will have incremented all the way round and back again during a single iteration of the loop. If it did, your real-time system would probably be in deep trouble anyway. :) – Ian Abbott Jun 17 '19 at 17:31
  • @Salahuddin Yes, disabling the interrupt is a good solution, but as stated in my answer, please make sure that this does not result in a system tick being missed altogether. – Ian Abbott Jun 17 '19 at 17:33
  • One obvious bug scenario is this: you read the shared variable `curTime = IL_TimNow;` then copy it into `prevTime`. Then you start reading `IL_TimNow` again, but an interrupt occurs during the read, corrupting curTime. This corrupted value happens to correspond to the value that the variable had previously. Your error check passes, but the new value in `IL_TimNow` is ignored and discarded. – Lundin Jun 18 '19 at 06:34
  • @Lundin, I agree that could happen if reading an ADC value for example, but `IL_TimNow` is monotonically increasing (until it wraps around). I can't see how `prevTime` and `curTime` could be corrupted to the same value unless `IL_TimNow` has cycled all the way round. – Ian Abbott Jun 18 '19 at 12:38
  • @Lundin, I am assuming the execution of `IL_TimTimeNow` is suspended during the execution of the timer ISR, which would be true for a single core. – Ian Abbott Jun 18 '19 at 13:05