3

I have an opensource project (https://github.com/WhiteFossa/yiff-l), where I use STM32F103 MCU.

In firmware I have a lot of sprintf's with float parameters, for example:

char buffer[32];
sprintf(buffer, "Power: %.1fW", power);

I used to use pretty old arm-none-eabi xpack for a long time (IIRC it was 9.3.1 version) and everything was fine.

Then I had to reinstall my fedora, and of course I've got a newer version of arm-none-eabi. My project stopped to compile, I fixed some code and now it compiles again, but produces hardfaults on any sprintf with float.

If I exclude -u _printf-float linker flag, then hardfaults are gone, but of course I don't get float representations in my strings.

I'm pretty sure that problem is not in mine code, I've even tried to put sprintf as a first line in a main(), and still have the same problems.

What can cause such weird behaviour? How can I debug it? (never used to debug ARM disassembly).

P.s. I've tried to install older arm-none-eabi xpacks, but without any luck - the problem remains.

P.p.s. I've tried to play with optimization settings/changed release to debug and vice versa, set 4x more stack in sections.ld - everything without success.

Any help will be appreciated much.

  • *I'm pretty sure that problem is not in mine code* - I would start with the opposite assumption. – Eugene Sh. May 31 '21 at 13:43
  • *but of course I don't get float representations in my strings.* - what do you get? What is the return value of `sprintf` ? – Eugene Sh. May 31 '21 at 13:47
  • One simple check is to create a large buffer, then call `sprintf` with the parameters you know you have the hardfault and check the length of the final string. [snprintf](https://www.cplusplus.com/reference/cstdio/snprintf/) is safer than `sprintf`. – V.Lorz May 31 '21 at 13:49
  • If you want to obtain more details about the hardfault, you can *break* into your hardfault handler check the values stored on the stack to dig into the dumped registers state. [Here](https://mcuoneclipse.com/2012/11/24/debugging-hard-faults-on-arm-cortex-m/) is an article that outlines the methodology, yet it focuses on an MCU from a different manufacturer and a different toolchain. – V.Lorz May 31 '21 at 13:54
  • @Eugene Sh. In embedded programming for Cortex and other MCU's you can compile the libraries disabling some features you might not need. Floating point numbers conversion is expensive in terms of code you need to put into flash memory, so we end up cutting out whatever that is not really needed. In some implementations `sprintf` just ignores the float formatters and give no error. – V.Lorz May 31 '21 at 13:59
  • @V.Lorz I know. Hence the questions. – Eugene Sh. May 31 '21 at 13:59
  • Eugene, if I disable floats support, I just getting empty spaces in the places of %f (and it is as expected). About my code - I've just did this thing: int main(int argc, char* argv[]) { char buffer[128]; sprintf(buffer, "Power: %.1fW", 1.23f); and still getting an error. Deep digging in disassembler shows that it faults somewhere in _dtoa_r in Newlib. – Ань Каирри May 31 '21 at 14:01
  • Take a look here: https://community.nxp.com/t5/Kinetis-Design-Studio/Floating-Point-sprintf-Causes-Bus-Error/m-p/461898 or here: https://www.embeddedrelated.com/showthread/comp.arch.embedded/271867-1.php If it is really calling `malloc`, then it is just broken. It is not supposed to assume any heap existence. – Eugene Sh. May 31 '21 at 14:03
  • Yes, it USES malloc(), just tested, and it uses it MANY times. Also it uses a lot of stack (but there is no stack overflow, I've checked it). – Ань Каирри May 31 '21 at 14:08
  • 1
    Then I guess you either need to implement some allocator for `malloc`, or a custom sprintf implementation (which I would go with). Or just work with integers by scaling the floats up (like multiply by 1000 or more). – Eugene Sh. May 31 '21 at 14:11
  • Newlib uses malloc in it's float printf printing in some places, but it would be _really_ strange if `malloc()` failure ended in a hardfloat, rather then in just `-1` return value with `ENOMEM`. I suspect your stack get's overflowed and malloc and stack pointer to the same region.... does just `int main() { printf("%0.1f", 1.0); }` result in failure? Does ti only happen with `sprintf`? Does it happen with `snprintf`? I have on blue pill laying around - I believe float newlib printing should be working fine on it. You might peek at [this](https://stackoverflow.com/a/67398286). – KamilCuk May 31 '21 at 14:57
  • Yes, just main() with sprintf() fails, snprintf() fails too. – Ань Каирри May 31 '21 at 15:28
  • Finally fixed it by switching to https://github.com/mpaland/printf – Ань Каирри May 31 '21 at 16:13

3 Answers3

2

Finding hardfaults is complex and might, very often, be frustrating. In most cases I've faced hardfaults I just jumped to using one hardware debugger. The one I use for Cortex-M devices is Segger's J-Tag Pro. There are other models, cheaper, much cheaper.

What can cause such weird behaviour? As you have identified your problem is masked-out when sprintf is disabled. My best guess, it is more than likely caused by a buffer overrun in your output buffer or by a buggy library code. Some sprintf implementations use heap memory allocations and don't check for NULL returns. There are BSP implementations using Fee-RTOS that require some more tuning for enabling dynamic memory management.

If it hardfaults even using large buffers, then this might be caused by internal memory allocations in sprintf. You can try to use another library. To my best understanding, newlib-nano is known to cause this error in some scenarios.

In some cases it is a matter of increasing the stack size for the tasks where sprintf/ snprintf are called and the global heap size.

How can I debug it? Probably the only solution would be using one hardware debugger in combination with one full IDE like Eclipse. Here is a bit of information that will give you some guidelines. The J-Link Edu Mini hardware debugger is fairly affordable, yet there are more options around you would also find useful.

The solution I gave to this same problem with a similar CPU (Kinetis K22, from former Freescale now NxP) was to create my own float to string conversion function. It increased performance noticeably (no more mallocs), hardfault errors were gone and I trimmed the per-task heap size to what I really needed.

V.Lorz
  • 293
  • 2
  • 8
  • I'm going to go this way and use https://github.com/MarioViara/xprintfc Also look at my comments - even sprintf at first line of main() causes a failure, so there is no buffer overflows. Also thanks for detailed answer. And sure, I use JTAG debugger. – Ань Каирри May 31 '21 at 14:34
2

Finally I was able to get my firmware running, with help of https://github.com/mpaland/printf

I.e. I've just get rid of Newlib's sprintf() in my code.

enter image description here

  • Just FYI: today's night I found a memory corruption issue in very early initialization code of my app. So it's possible, that it was the cause of hardfault on sprintf(). – Ань Каирри Dec 25 '21 at 07:46
1

Only thing I can think of while seeing this code is that the value of power is huge:

double power = 11111111111111111111111111111111111111111111.2;
char buffer[32];
sprintf(buffer, "Power: %.1fW", power);

printf("%s\n", buffer);

If that isn't the case there must be some UB somewhere else

Serve Laurijssen
  • 9,266
  • 5
  • 45
  • 98
  • Cen be rectified by using `snprintf` – Eugene Sh. May 31 '21 at 13:46
  • int main(int argc, char* argv[]) { char buffer[128]; sprintf(buffer, "Power: %.1fW", 1.23f); This code is fails, and power isn't huge. Will look at snprintf(), but suspect that it's not implemented in Newlib. – Ань Каирри May 31 '21 at 14:02
  • snprintf() fails as hard as ordinary sprintf(): int main(int argc, char* argv[]) { char buffer[128]; snprintf(buffer, 128, "Power: %.1fW", 1.23f); I'm starting to thing about switching to another sprintf() implementation (not from Newlib). – Ань Каирри May 31 '21 at 14:06