3

I've read about the Cello fat pointer library and wanted to know if prepending a header to some data is actually allowed in C, e.g. something like this:

struct header
{
  size_t length;
};

char* create_string(const char* cstr)
{
  void* ret = malloc(sizeof(struct header)+strlen(cstr)+1);
  ((struct header*)ret)->length = strlen(cstr);
  strcpy((char*)ret + sizeof(struct header), cstr);
  return (char*)ret + sizeof(struct header);
}

size_t get_length(const char* sized_string)
{
  struct header* tmp = sized_string - sizeof(struct header);
  return tmp->length;
}

This example is a string, but it could be anything else that is stored, maybe even something that isn't an array, in which case the header could have different kinds of metadata.

I know that sds uses flexible array members, but that requires c99 and isn't quite as flexible as this aproach (unless you just used a generic char array and recast it as needed).

Specifically in this question people say the following is not actually portable:

struct header {
  size_t len;
  unsigned char data[1];
};

Because accessing data beyond the first element is UB. Another thing is that the lines ((struct header*)ret)->length = strlen(cstr); and struct header* tmp = sized_string - sizeof(struct header); look wrong to me (because of the pointer casts) and I don't see a better way to write them.

Kona98
  • 139
  • 2
  • 9
  • 3
    One thing you need to be careful about is that you can cause your data to be incorrectly aligned due to the offset introduced by your header. This can result in a performance impact or even a crash in some cases. You would need to ensure that the size of your header is equal to an integer multiple of your target platform's malloc alignment (e.g. typically 16 for for 64 bit platforms, 8 for 32 bit). – Paul R May 16 '18 at 17:03
  • 1
    @PaulR Yeah, I forgot about unaligned access. I don't really care that much about the performance but accessing data unaligned is UB, and trying to force alignment is guesswork and not totally portable (and wastes space in most cases). – Kona98 May 16 '18 at 17:11
  • 1
    "Specifically in this question people say the following is not actually portable" (general FMA discussion) is mostly of answers 10 years ago. The [2017 answer](https://stackoverflow.com/a/46251908/2410359) better applies in 2018. – chux - Reinstate Monica May 16 '18 at 17:15
  • @chux That answer also says that the struct I showed is UB ("that yield undefined behavior (e.g. as in long v[0]; or long v[1];)"), so doesn't really matter in this case. – Kona98 May 16 '18 at 17:21
  • @MarcSchulze With your structure `data[2]` is UB – Stargateur May 16 '18 at 17:23
  • 1
    @MarcSchulze The higher point is that `struct header { size_t len; unsigned char data[]; };` is preferable in 2018, vs. this question's approach. – chux - Reinstate Monica May 16 '18 at 17:23
  • @chux With the FAM, how could I do what I did with my functions (return the char pointer and retrieve the length later)? some_string - sizeof(len) could point into padding bytes. – Kona98 May 16 '18 at 17:33
  • 1
    @MarcSchulze I'd recommend against returning a `char *`, but instead a `struct header *`. If other code needs the `char *`, access the `data` member. If stuck with returning a `char*`, return `&p->data[0]`. Easy enough to reconstruct `p` from that. – chux - Reinstate Monica May 16 '18 at 17:45

1 Answers1

6

Your example places and reads from fixed offsets only. So at least the pointer arithmetic is perfectly legal. At least as long as the type is just a char.

What is problematic though is alignment. This is not violating the language standard yet, but the alignment of the actual data is worse than what e.g. compilers on x86 will provide by default.

So if the compiler would by default align to 8 bytes (gcc and msvc on x86) or 16 bytes (on x64), your example code provides only half the alignment.

This can be illegal, if the compiler was assuming aligned memory (which would otherwise had been ensured by matching malloc implementation, as well as padding in stack layout and structs). Depending on the architecture, it can even cause errors, as instructions may require a minimum alignment.

Ext3h
  • 5,713
  • 17
  • 43