2

Background I am working on an embedded application written in C using IAR Embedded Workbench IDE and toolchain which runs on an STM32F091 (ARM Cortex-M0 core) microcontroller. The application writes data to the microcontrollers embedded flash memory, into which only 32-bit words can be entered (perhaps half-words work also).

Problem description The data is stored in an uint8_t byte type array preceeded by some header information at the start (in this case an AT response code from an on-board modem) which should not be written to flash. I'd like to send a uint32_t pointer to to where in the uint8_t buffer the actual data starts. But if this offset is not 4 byte aligned, my application crashes since it tries to access an unaligned uint32_t type.

This describes what I'm trying to do (not the real code, just an example):

uint8_t modemResponseBuffer[MAX_MODEM_RESPONSE_SIZE];

/* Get the modem response data (including modem response header data) */
size_t modemResponseSize = GetModemResponseData(modemResponseBuffer);

/* Get the actual data size from the header information */
size_t dataSize = GetActualDataSizeFromModemResponseHeader(modemResponseBuffer);

/* Get the offset to where the actual data starts in the modem response */
size_t modemDataOffset = GetModemResponseDataOffset(modemResponseBuffer);

/* Write the data part of the response to embedded flash memory. 
The modemDataOffset can be any number which messes up 4 byte data alignment */
ProgramFlashMemory(DATA_FLASH_STORAGE_ADDRESS, (uint32_t*)&modemResponseBuffer[modemDataoffset],
 dataSize);

Inside the ProgramFlashMemory function, the FLASH_ProgramWord Standard Peripheral Library function is called in a loop.

Question(s) How do I solve this problem efficiently? I'm working on a system where I have limited amount of memory (32 kb RAM), so I would prefer not to copy the desired contents from the uint8_t buffer to a new buffer of uint32_t type. At the moment I've manually aligned the data byte by byte by looping through, but this seems rather clumsy to me. But I've yet to come up with a better solution and I am interested in what suggestions I might receive here.

Also, if someone has the knowledge, I also wonder why the application crashes in this case. What is the reason my core (or any core?) can't handle unaligned data types?

Stenis
  • 118
  • 9
  • Do you need the header data? – alk Aug 08 '15 at 11:26
  • Well I need it to identify the type of data the application is receiving and the size of it. I'm sorry if that was unclear from the example I made to show the problem. – Stenis Aug 08 '15 at 11:32
  • Do you need the header after having written to the flash-memory? – alk Aug 08 '15 at 11:39
  • While wording my answer this question came to my mind: Are you sure `ProgramFlashMemory ()` expects the number of bytes (`uint8_t`) and not the number of `uint32_t`? – alk Aug 08 '15 at 11:54
  • "*... my application crashes since it tries to access an unaligned uint32_t type.*" from what do you conclude this? – alk Aug 08 '15 at 11:56
  • Well, the code ends up running in the HardFault handler function. – Stenis Aug 08 '15 at 13:12
  • Your compiler/linker should provide a way to align data. Like `#pragma DATA_ALIGN` found in TI cg6x compiler. – user3528438 Aug 08 '15 at 15:07
  • Or you can `union` it with a `uint32_t` or a `uint32_t[MAX_MODEM_RESPONSE_SIZE/4]`, either one will force it to `uint32_t` alignment. – user3528438 Aug 08 '15 at 15:10
  • Read the compiler's user manual. It's common practice to align static allocated memory to a certain boundary, usually cache line boundary or double-word boundary, for performance and other reasons. Your compiler should provide it. – user3528438 Aug 08 '15 at 15:13
  • 32KiB RAM is **limited**? Oh, please do not look at the smaller MCUs then; you will be shocked. :-) – too honest for this site Aug 08 '15 at 16:59
  • Use proper serialisation and not wildly casting. You might run into more than alignment problems, e.g. endianess. – too honest for this site Aug 08 '15 at 17:02
  • @user3528438: Heard about C11? `stdalign.h`? – too honest for this site Aug 09 '15 at 01:03

3 Answers3

3

Change ProgramFlashMemory() to take a void*, then internally cast that to a uint8_t*, which you then iterate taking four bytes at a time into a unit32_t which you then write to the flash.

The void* allows the address of any object to write without needing an explicit cast.

Something like:

int ProgramFlashMemory( uint32_t* addr, void* data, int length )
{
    int byte = 0 ;
    int word = 0 ;

    while( byte < length )
    {
        uint32_t flash_word = 0 ;

        // Copy four bytes to word
        // Note: little-endian byte order assumed,
        //       reverse for big-endian.  
        // If end is not aligned, fill with 0xff.
        for( b = 0; b < 4; b++ )
        {
            flash_word |= byte < length ? 
                          (uint8_t*)data[byte] << (b<<3) : 
                          0xFF ;
            byte++ ;
        }

        ProgramFlashWord( addr[word], flash_word ) ;
        word++ ;
    }

    // Return bytes written - may be linger than `length` by up-to 
    // three for end-alignment.
    return byte ;
}

You may want to keep the original ProgramFlashMemory() for efficient aligned writes, in which case perhaps ProgramFlashMemoryUnaligned(). Note that it is not just the alignment, but the length being not necessarily divisible by four that you need to take care of.

Clifford
  • 88,407
  • 13
  • 85
  • 165
  • This looks excellent, will try out making my own ProgramFlashMemoryUnaligned next week. Perhaps even making a CheckIfAligned function as well. – Stenis Aug 09 '15 at 08:05
  • A potential problem with dynamically checking for alignment and using different functions to perform the programming, is that your application performance can vary non-deterministically, which in hard real-time applications may be undesirable. My proposal was to use the aligned operation for objects that were always aligned, and the unaligned for when they *may* be unaligned - simply so that performance is deterministic. – Clifford Aug 09 '15 at 08:18
  • ... That said on STM32 it is largely academic - programming on-chip flash on STM32 stalls the bus and therefore blocks instruction fetch for considerable periods (up to 40ms on STM32F1 and a huge 800ms on STM32F2!), so on-chip flash memory programming on STM32 is already non-deterministic and unsuited to hard real-time applications. You should be very careful of that "feature". – Clifford Aug 09 '15 at 08:20
  • Thanks for the heads up, but I've gathered as much already about flash programming and core stalling. It is not a real-time system per se. I've thought about modifying the Standard Peripheral Library code for flash programming and running it from RAM instead to avoid this issue, but it is not a necessity at the moment. – Stenis Aug 09 '15 at 08:25
  • There seem to be some misconceptions about what "hard real-time" means here. By definition, having a hard real-time means missing a deadline is considered a total system failure. As long as your system-level design could afford to account for the worst-case programming time *every* time, you could still technically use this feature in a hard realtime system. Except that self-updating of your own flash while operational is almost always a no-no for any system that is developed to hard realtime specifications, and when NOT operational, exceptions can be made. – Brian McFarland Aug 10 '15 at 22:35
1

This assumes the header data is not needed anymore after having written the payload data.

To make sure the buffer's alignment is correct you might want to declare it like this:

uint32_t modemResponseBuffer[(MAX_MODEM_RESPONSE_SIZE * sizeof (uint8_t) / sizeof (uint32_t)) + 1];

Just move the payload data to the beginning of the buffer prior to calling the writer-function:

memmove(modemResponseBuffer, modemResponseBuffer + modemDataoffset, dataSize);

Please note that memcpy() would not work here as destination and source overlap.

Then call the writer like this:

ProgramFlashMemory(DATA_FLASH_STORAGE_ADDRESS, modemResponseBuffer, dataSize);
alk
  • 69,737
  • 10
  • 105
  • 255
  • Thanks for your suggestion, using memmove will likely work. However, since the memory overlaps (as you noted) memmove will create a temporary copy of the data. This is not desirable as the data may be up to 2048 bytes. – Stenis Aug 08 '15 at 13:11
  • @Stenis: Then use a loop using `memcpy()`, just only copying as much bytes per iteration as the header used to use. – alk Aug 08 '15 at 13:13
  • Yes, I have a similar solution in place at the moment, but it only copies bytewise. Thinking it might be the best way to do it after all, with the modification you suggested. – Stenis Aug 08 '15 at 13:16
1

The reason the ARM Cortex-M0 core crashes on the unaligned access is because that's what it's designed to do. Taking the hard fault exception is actually an improvement over some older cores which would access the value incorrectly and then continue executing with the corrupt value. Some newer ARM cores do have limited hardware support for unaligned accesses.

Here are a copule of suggestions.

Redesign ProgramFlashMemory() so that it accepts a uint8_t* rather than a uint32_t*. In order to program words it should copy the individual bytes from the buffer into a local variable which has the proper alignment. Then write the local variable copy to flash.

Redesign GetModemResponseData() so that it parses the header as the header is being read from the stream. It will determine the length of the header and the starting point of the data before the data is read from the stream. When the stream is at the start of the data it should begin copying the data into a fresh buffer that is separate from the header and properly aligned.

kkrambo
  • 6,643
  • 1
  • 17
  • 30
  • I appreciate your suggestion, it would surely be a viable option to keep memory usage low (not minimal, but low enough) by using separate buffers for header and data. – Stenis Aug 09 '15 at 08:14
  • If others are interested in data alignment and CPU's, I found dedicated question on StackOverflow regarding that here: http://stackoverflow.com/questions/3025125/cpu-and-data-alignment – Stenis Aug 09 '15 at 08:31