1

We are writing real time software in embedded C for the PowerPC 604.

What follows, is a conversation between two of my colleagues. I'm having a difficult time understanding what they are talking about.

Employee 1:

In the PPC architecture, must integer alignment follow word boundaries? This is related to whether the pointer stack math/comparisons might be better served as cast to integer pointers instead of char pointers. If PPC guarantees word alignment then seeing pointer values that are not word aligned seems it would be an additional check-able red-flag, whereas char pointers could by their nature be odd address values 3/4 of the time...just a thought that came to me...am I totally off-base?

Employee 2:

Only floating point values must be on 4-byte aligned memory address. All other values do not have this requirement. This is why we have 4-byte alignment checks when parsing network packets (which can be at any byte offset in the packet as sent). WORD alignment is not guaranteed otherwise.

Employee 1:

I probably didn't state my issue satisfactorily. In the PPC architecture pointers should generally have word boundary values unless they are pointing to values in a vector of characters. The architecture makes every effort to align all non-vector values to word boundaries. This allows for an additional corruption check such that if a pointer value is **not on a four byte boundary and does not point to an element in a packed struct then it probably means data has been corrupted...That was my only point.

Employee 2:

I think you misunderstood my answer. Unless they changed it, which could very well be true, that it not the case. WORD alignment is not guaranteed and a check against WORD alignment tells us nothing. The corruption check would not be possible this way. I specifically looked this up in the old documentation several years ago, and they certainly could have changed it. We would need to find proof of this though. The only data type that is guaranteed to be placed on a word aligned memory address are floats and it's a compiler option, not an architecture requirement.

Employee 1:

I've got "the proof" if you want to see it. unless the data is explicitly packed or is a char vector index it will reside at an address ending in [0,4,8,c] on PPC.

I'm very confused. Vector? They must be referring to arrays. How much of this information makes sense, and how much of it is questionable?

What are the rules for data alignment and word boundaries?

What are they trying to determine?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Trevor Hickey
  • 36,288
  • 32
  • 162
  • 271
  • 4
    Your question title should be revised. word/byte boundaries are not an artifact of C, they are a consequence of a particular platform/ABI. – David Hoelzer May 17 '16 at 12:55
  • It depends in what language you are programming. If you are coding in C then UB is UB. This question only makes sense when you program in assembly. – user3528438 May 17 '16 at 13:05
  • _What are the rules for data alignment...?_ Always implementation specific. I think employee 2 is bringing up an interesting point about only floating point where boundaries are strictly enforced. That is the only part of the whole conversation that makes a little sense to me. When you consider the variability of the way floats are stored from system to system. _[eg. this discussion](http://stackoverflow.com/q/6910115/645128)_ Interpreting the data stored for a float would be impossible without strict adherence to boundaries. – ryyker May 17 '16 at 13:31

2 Answers2

1

I'm very confused. Vector? They must be referring to arrays.

By "vector of characters", the speaker appears to be referring to a contiguous sequence of char / unsigned char. That could correspond to a C array, but I suspect he uses the term "vector" in recognition of the fact that any block of contiguous memory can be viewed as a contiguous sequence of char, and that a char * can point at any char anywhere in such a sequence.

What are the rules for data alignment and word boundaries?

They vary with machine architecture. How that relates to C programs is an aspect of the C implementation. In a "hosted" environment, that is an aspect of the operating system's "application binary interface" (ABI), but for an embedded system you might be using a "freestanding" C implementation, in which case ABI isn't really a thing -- there's just the C implementation itself.

What are they trying to determine?

Consider the following code:

#include <stdint.h>

_Bool is_word_aligned(int anyint) {
    return (((uintptr_t) &anyint) % sizeof(int) == 0);
}

The main question being discussed is roughly equivalent to this one: "can function is_word_aligned() ever return a falsey result?". Parts of the discussion take as given that the system's natural word size is 4 bytes, but I have instead written the word size as sizeof(int); that correspondence is typical of 32-bit systems, but not guaranteed anywhere. "Word size" is not a C concept.

I have also assumed that casting a pointer value to integral type yields the corresponding numeric address in the process's address space; this also is typical, but not guaranteed. Nevertheless, the discussants also seem to be making that assumption, for otherwise there is no way in C to perform the kinds of tests on an address that they are talking about.

The two parties acknowledge that a char * may point to any address at all. That follows from C's specifications if char corresponds to the smallest addressable unit of storage, which, again, is typical, but not guaranteed. The two employees seem to be discussing existing code that performs internal consistency checking. It seems that the existing code performs explicit conversions from some pointer type to char *, and then uses pointer arithmetic to address individual bytes of the pointed-to object. Employee 1 proposes casting instead to int *, and supposes that if the machine architecture and C implementation require ints to be word-aligned, then the code could add that as a validation check.

How much of this information makes sense, and how much of it is questionable?

To the extent that any information is presented, that information is plausible. Whether Employee 1's proposal is sensible is a different question. Employee 2 argues that it is not, on the basis that most values are not required to be word-aligned on the underlying machine architecture. This seems to be a pretty strong argument. Employee 1 observes that in practice, the C implementation does align storage on word boundaries, but it is difficult to know whether that can be relied upon as an absolute rule. Moreover, if the original pointer in question, before conversion, is not an int *, then there is no particular reason to take it as a sign of invalidity that converting that pointer to an int * yields a result that does not correspond to a word-aligned address.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 1
    what is your `is_word_aligned` actually computing? – user3528438 May 17 '16 at 18:03
  • @user3528438, `is_word_aligned()` converts the address of parameter `anyint` to an unsigned integer, and determines whether that integer is a multiple of the size of an `int`. If so, it returns a truthy value (1) and if not, it returns a falsey value (0). As I said, this is used to describe a *rough* equivalent to the question in the OP's discussion. The alternative I present is not an exact equivalent, nor can I provide one without reference to the code that was the subject of the converation in question. – John Bollinger May 17 '16 at 18:12
  • `anyint` is in the callee's scope and is allocated "naturally" with automatic storage duration, so it's not computing anything. It's not like in C++ that you can pass by reference then the address is taken from the variable in the caller's scope. – user3528438 May 17 '16 at 19:20
  • @user3528438, we are apparently having a disagreement about how "rough" is the equivalence I describe. Yes, `anyint` is local to the function. The corresponding argument is passed by value, and that value is not used. But nothing in the standard requires the alignment requirement of an `int` to be the same as its size, therefore the standard alone does not enable you to predict the function result. Which implementation details are required to allow you to predict the result is the crux of the matter. – John Bollinger May 17 '16 at 20:42
  • So you are basically rolling the dice, like "if the default alignment of `anyint` is 1 byte but it's size is 4 bytes, then calling that function has a 25% chance of returning true"? – user3528438 May 17 '16 at 21:22
  • @user3528438, not exactly. This is what I actually say: "The main question being discussed is roughly equivalent to this one: 'can function is_word_aligned() ever return a falsey result?'" This is a thought experiment. If `sizeof(int) == 4`, and the combination of machine architecture and C implementation require that `int`s be aligned on 4-byte boundaries, and we assume that all declared `ints` will be aligned correctly, then the function must always return `true`. If `int`s do not have to be aligned on 4-byte boundaries then it is conceivable that the function returns `false`. – John Bollinger May 17 '16 at 22:03
0

In its simplest form, the following is valid for a PPC e604 processor running WindRiver C code.

For any defined variable in the code such as:

int a;

char c;

static unsigned short s_us;

char* ptr = (char*)malloc(50);

the address of each of the variables, and the value of ptr, will be a word boundary (specific to the word size of the system architecture).

The historical reason for this is that back when I started in computers and we rode dinosaurs to the office (we actually had real offices then) many system architectures required all memory accesses to be on word boundaries, and they would trigger faults if an attempt was made to access a memory at an odd address. To access odd addressed data the full word needed to be loaded into a register, and then shifted out.

The modern reason is for efficiency. There are still architectures that pay a performance penalty to access data that is not word aligned. PPC is one such architecture.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131