17

During a code review I've come across some code that defines a simple structure as follows:

class foo {
   unsigned char a;
   unsigned char b;
   unsigned char c;
}

Elsewhere, an array of these objects is defined:

foo listOfFoos[SOME_NUM];

Later, the structures are raw-copied into a buffer:

memcpy(pBuff,listOfFoos,3*SOME_NUM);

This code relies on the assumptions that: a.) The size of foo is 3, and no padding is applied, and b.) An array of these objects is packed with no padding between them.

I've tried it with GNU on two platforms (RedHat 64b, Solaris 9), and it worked on both.

Are the assumptions above valid? If not, under what conditions (e.g. change in OS/compiler) might they fail?

Adam Holmberg
  • 7,245
  • 3
  • 30
  • 53
  • @Matthieu: Thanks for reminding us. I'm sure the OP had overlooked that. –  Oct 31 '10 at 14:27

9 Answers9

22

It would definitely be safer to do:

sizeof(foo) * SOME_NUM
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
ThePosey
  • 2,734
  • 2
  • 19
  • 20
20

An array of objects is required to be contiguous, so there's never padding between the objects, though padding can be added to the end of an object (producing nearly the same effect).

Given that you're working with char's, the assumptions are probably right more often than not, but the C++ standard certainly doesn't guarantee it. A different compiler, or even just a change in the flags passed to your current compiler could result in padding being inserted between the elements of the struct or following the last element of the struct, or both.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • 1
    It certainly wouldn't surprise me if a compiler decided it liked things on four-byte boundaries, and put a byte of padding at the end. – David Thornley Nov 04 '09 at 20:34
  • I know this is an old question, but I'm wondering: where in the c++ standard is it stated that padding can be added to the _end_ of the object, and not the beginning? I read that it must be contiguous and that an array new-expressing may assign more space then required, but I cannot find any info that e.g. that the array object and the first element have the same address. [basic.compound] note 4 say this, must it doesn't seem to be a requirement. The standard doesn't seem to give clear explicit guarantees. – JHBonarius Jan 17 '22 at 10:24
  • 1
    @JHBonarius: The current standard only gives this guarantee with respect to standard layout objects and non-bitfield members. The normative text is at [class.mem]/26: "If a standard-layout class object has any non-static data members, its address is the same as the address of its first non-static data member if that member is not a bit-field." – Jerry Coffin Jan 17 '22 at 16:39
6

If you copy your array like this you should use

memcpy(pBuff,listOfFoos,sizeof(listOfFoos));

This will always work as long as you allocated pBuff to the same size. This way you are making no assumptions on padding and alignment at all.

Most compilers align a struct or class to the required alignment of the largest type included. In your case of chars that means no alignment and padding, but if you add a short for example your class would be 6 bytes large with one byte of padding added between the last char and your short.

nschmidt
  • 2,383
  • 16
  • 22
5

I think the reason that this works because all of the fields in the structure are char which align one. If there is at least one field that does not align 1, the alignment of the structure/class will not be 1 (the alignment will depends on the field order and alignment).

Let see some example:

#include <stdio.h>
#include <stddef.h>

typedef struct {
    unsigned char a;
    unsigned char b;
    unsigned char c;
} Foo;
typedef struct {
    unsigned short i;
    unsigned char  a;
    unsigned char  b;
    unsigned char  c;
} Bar;
typedef struct { Foo F[5]; } F_B;
typedef struct { Bar B[5]; } B_F;


#define ALIGNMENT_OF(t) offsetof( struct { char x; t test; }, test )

int main(void) {
    printf("Foo:: Size: %d; Alignment: %d\n", sizeof(Foo), ALIGNMENT_OF(Foo));
    printf("Bar:: Size: %d; Alignment: %d\n", sizeof(Bar), ALIGNMENT_OF(Bar));
    printf("F_B:: Size: %d; Alignment: %d\n", sizeof(F_B), ALIGNMENT_OF(F_B));
    printf("B_F:: Size: %d; Alignment: %d\n", sizeof(B_F), ALIGNMENT_OF(B_F));
}

When executed, the result is:

Foo:: Size: 3; Alignment: 1
Bar:: Size: 6; Alignment: 2
F_B:: Size: 15; Alignment: 1
B_F:: Size: 30; Alignment: 2

You can see that Bar and F_B has alignment 2 so that its field i will be properly aligned. You can also see that Size of Bar is 6 and not 5. Similarly, the size of B_F (5 of Bar) is 30 and not 25.

So, if you is a hard code instead of sizeof(...), you will get a problem here.

Hope this helps.

NawaMan
  • 25,129
  • 10
  • 51
  • 77
  • looks great, unfortunately the anonymous struct inside the offsetof call does not compile in msvc 2010 –  Oct 31 '10 at 14:37
2

I would've been safe and replaced the magic number 3 with a sizeof(foo) I reckon.

My guess is that code optimised for future processor architectures will probably introduce some form of padding.

And trying to track down that sort of bug is a real pain!

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Rob Wells
  • 36,220
  • 13
  • 81
  • 146
2

It all comes down to memory alignment. Typical 32-bit machines read or write 4 bytes of memory per attempt. This structure is safe from problems because it falls under that 4 bytes easily with no confusing padding issues.

Now if the structure was as such:

class foo {
   unsigned char a;
   unsigned char b;
   unsigned char c;
   unsigned int i;
   unsigned int j;
}

Your coworkers logic would probably lead to

memcpy(pBuff,listOfFoos,11*SOME_NUM);

(3 char's = 3 bytes, 2 ints = 2*4 bytes, so 3 + 8)

Unfortunately, due to padding the structure actually takes up 12 bytes. This is because you cannot fit three char's and an int into that 4 byte word, and so there's one byte of padded space there which pushes the int into it's own word. This becomes more and more of a problem the more diverse the data types become.

Afcrowe
  • 129
  • 4
2

For situations where stuff like this is used, and I can't avoid it, I try to make the compilation break when the presumptions no longer hold. I use something like the following (or Boost.StaticAssert if the situation allows):

static_assert(sizeof(foo) <= 3);

// Macro for "static-assert" (only usefull on compile-time constant expressions)
#define static_assert(exp)           static_assert_II(exp, __LINE__)
// Macro used by static_assert macro (don't use directly)
#define static_assert_II(exp, line)  static_assert_III(exp, line)
// Macro used by static_assert macro (don't use directly)
#define static_assert_III(exp, line) enum static_assertion##line{static_assert_line_##line = 1/(exp)}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
S.C. Madsen
  • 5,100
  • 5
  • 32
  • 50
1

As others have said, using sizeof(foo) is a safer bet. Some compilers (especially esoteric ones in the embedded world) will add a 4-byte header to classes. Others can do funky memory-alignment tricks, depending upon your compiler settings.

For a mainstream platform, you're probably alright, but its not a guarantee.

Mike Lewis
  • 734
  • 1
  • 7
  • 18
0

There might still be a problem with sizeof() when you are passing the data between two computers. On one of them the code might compile with padding and in the other without, in which case sizeof() would give different results. If the array data is passed from one computer to the other it will be misinterpreted because the array elements will not be found where expected. One solution is to make sure that #pragma pack(1) is used whenever possible, but that may not be enough for the arrays. Best is to foresee the problem and use padding to a multiple of 8 bytes per array element.