7

I am trying to implement a TLV (Type-Length-Value) in C, however I am having issues with getting the dynamic size of value to work correctly.

My struct looks like this:

typedef struct __attribute__((packed)){
   unsigned char type;
   unsigned char length;
   unsigned char * value;
} TLV;

I am trying to cast an array to the struct so that I can easily access the type, and length. For example an array:

unsigned char test[5] = {(unsigned char)'T', 0x03, 0x01, 0x02, 0x03};

Where the 'T' in the array is the type, the first 0x03 is the length.

I am casting the array to the struct like so:

TLV* tlv = (TLV*)test; 

Yet when I try to access the value array I get a segmentation fault, even when I try to access the first element of the value memory address (which should be the first element in the array after the length).

How can I get around this segmentation fault?

Dean
  • 8,668
  • 17
  • 57
  • 86

2 Answers2

5

value is not array, it's a pointer (which is pointing somewhere outside the struct). If you want array (of unknown size), write unsigned char value[1] instead.

typedef struct __attribute__((packed)) {
    unsigned char type;
    unsigned char length;
    unsigned char value[1];
} TLV;

Having an array of size 1 allows you to actually address any number of bytes. That is actually UB, but it is actually used and works properly in all cases I saw.

GCC allows to use arrays of size 0. I am so used to that convention that I forgot that arrays of size 0 are not allowed in C.

Edit:

Long answer

There is a difference between arrays and pointers. While you can use the similar code to work with both, these are still different beasts.

Disclaimer: The following code works in gcc, but it might not be strictly valid. I did not try to make it completely valid.

Let's define two structures:

typedef struct {
    char p[20];
} sa;

typedef struct {
    char *p;
} sp;

And create instances of those:

sa x = { "Hello, world" };
sp y = { "Howdy, world" };

What's the difference between those two?

printf("%s\n", x.p); // prints "Hello, world"
printf("%s\n", y.p); // prints "Howdy, world"

What about addresses of these?

printf("address of x = %p\n", &x); // On my machine it prints 0x7fffacce9b20
printf("address of y = %p\n", &y); // 0x7fffacce9b10

Well.. not really interesting except that these numbers are .. quite similar - both structures are located in about the same spot - in my case it's stack, end of address space, but could be somewhere else.

printf("address of x.p = %p\n", &x.p); // 0x7fffacce9b20
printf("address of y.p = %p\n", &y.p); // 0x7fffacce9b10

Same numbers. As expected actually.

printf("address of x.p[0] = %p\n", &x.p[0]); // 0x7fffacce9b20 - same as before
printf("address of y.p[0] = %p\n", &y.p[0]); // 0x400764 - could be anything

Now these are different. The string "Hello, world" is located in the same spot as structure x, while string "Howdy, world" is located somewhere else - data segment, somewhere in the beginning of address space, but, again could be somewhere else.

So this is the difference: array is some data stored "here", while pointer is only address of the data stored "somewhere".

In your case you want to be able to keep the data somewhere "here" - right after the type and length. This is why you need arrays, not pointers.

I can not find any proof that the TLV implementation above is not UB, but I saw a lot of cases where array of chars was "parsed" by casting it to pointer to some structure. I even wrote a code like this myself.

0-size arrays

As I said before, arrays of size 0 are not allowed by C standard. But they are allowed by GCC and this is handy because it allows you to do the following:

typedef struct {
    unsigned char type;
    unsigned char length;
    unsigned char value[0];
} TLV;

int required_length = 10;
TLV *tlv = (TLV *) malloc(sizeof(TLV) + required_length);

Without 0-size arrays you'd have to add (or subtract? subtract I guess) 1 somewhere in the code above.

aragaer
  • 17,238
  • 6
  • 47
  • 49
-1

The following is almost entirely portable and certainly not UB due to aliasing because at no point is test dereferenced so you can forget about it.

What isn't (technically) portable is the assumption that there's no internal padding in the struct TLV.

To obtain portability I've removed the __attribute__((packed)).

If your compiler supports it then you are 100% in the clear with no UB.

That is unless you change value to an aligned type then you are likely to be broken. This all works because sizeof(unsigned char) has to be 1 and type alignment has to divide their size. Remember, if it doesn't malloc(n*sizeof(T)) for some type T is broken as an array of n elements of type T. The C standard is painted into a corner that unsigned char can't be aligned because it's always legit to treat memory as an array of char (either kind).

So the following program will either fail at the assert(.) or execute successfully. On all known platforms it will execute successfully as there are no known platforms that would choose to internally pad the given data structure - whether you specify packed or not.

But why do this:

#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>


typedef struct {
   unsigned char type;
   unsigned char length;
   unsigned char value;
} TLV;

static TLV dummy;

int main(void) {

    //There's no standard way to verify this at compile time.
    //NB: If you stick with packing or leave all the members of TLV the same type
    //Then this is almost certainly NOT an issue.
    //However the cast of test implicitly assumes the following is the case.
    //Here's a run-time check of a static constraint.
    assert(offsetof(TLV,value)==(sizeof(dummy.type)+sizeof(dummy.length)));

    unsigned char test[5] = {(unsigned char)'T', 0x03, 0x01, 0x02, 0x03};

    TLV* tlv=(TLV*)test;

    for(unsigned char i=0;i<tlv->length;++i){
        printf("%u\n",(&tlv->value)[i]);
    }

    (&tlv->value)[0]=253;
    (&tlv->value)[1]=254;
    (&tlv->value)[2]=255;

    for(unsigned char i=0;i<tlv->length;++i){
        printf("%u\n",(&tlv->value)[i]);
    }

    return EXIT_SUCCESS;
}

When you can do this (C99 onwards I'm told) and have no crappy alignment problems:

#include <stdlib.h>
#include <stdio.h>

typedef struct {
   unsigned char type;
   unsigned char length;
   unsigned char value[];//Variable length member.
} TLV;

int main(void) {

    TLV* tlv=malloc(sizeof(TLV)+3*sizeof(unsigned char));

    tlv->type='T';
    tlv->length=3;
    tlv->value[0]=1;
    tlv->value[1]=2;
    tlv->value[2]=3;

    for(unsigned char i=0;i<tlv->length;++i){
        printf("%u\n",tlv->value[i]);
    }

    tlv->value[0]=253;
    tlv->value[1]=254;
    tlv->value[2]=255;

    for(unsigned char i=0;i<tlv->length;++i){
        printf("%u\n",tlv->value[i]);
    }

    free(tlv);

    return EXIT_SUCCESS;
}

Notice that there is no compliant guaranteed way of allocating such things statically because there is no compliant guaranteed way of dictating the layout (and in particular size) of the structure so you cannot know how much space to allocate in the char array.

You can (of course) blend the solutions but if you pack the structure you are likely to break alignment of value (if something needing alignment) and if you don't risk the compiler internally padding TLV. That internal padding is unlikely in the current guise but actually very likely if you upgraded length to type of size_t - which is the natural 'full' answer.

The current length limit of 255 (on almost all platforms) is frankly stingy. It felt mean in 1993 writing in Turbo Pascal. In 2015 it's piffling. At least implement length as `unsigned int' unless you know such a tight ceiling is sufficient.

Persixty
  • 8,165
  • 2
  • 13
  • 35
  • OP states he is dereferencing the struct pointer *Yet when I try to access the value array I get a segmentation fault, even when I try to access the first element of the value memory address (which should be the first element in the array after the length).* Where value is the element in the struct and accessed via pointer. Therefore causing you know what ( Undfined b. ). – 2501 Jan 12 '15 at 14:15
  • @2501 He hasn't provided his code but I believe my first specimen answer resolves that problem. Are you saying I've missed the problem? – Persixty Jan 12 '15 at 14:17
  • Your second example is correct, but the first isn't, specifically these problems: Breaking strict aliasing rules here: `printf("%u\n",(&tlv->value)[i])`: *ISO/IEC 9899:201x 6.5,p7* and *6.2.7,p2*. And in the same line going out of bounds of the object when the value of i is more than zero: *6.5.6, p8*. In both cases Standard clearly says, quoting: *the behavior is undefined* – 2501 Jan 12 '15 at 14:19
  • @2501 I'll check but I believe you're over interpreting aliasing. Firstly you're not over array bounds. `test` was allocated as an array. It is definitely OK to perform pointer arithmetic in there. Second there is no dereferencing of aliases. At all, What are the two dereferences of aliases that you think are problematic? You can't pick on access and declare it aliasing. You need two. `test` is never dereferenced. It's fine. – Persixty Jan 12 '15 at 14:23
  • *test is never dereferenced.* test is dereferenced right here `printf("%u\n",(&tlv->value)[i]);` Dereferencing the struct pointer itself causes undefined behavior, since: *ISO/IEC 9899:201x 6.2.5, p28* states: *Pointers to other types need not have the same representation or alignment requirements.* which means that struct pointer cannot represent a char pointer. Even if you argue that `&tlv->value` doesn't dereference the pointer, the pointer itself isn't valid. – 2501 Jan 12 '15 at 14:23
  • @2501 As stated `unsigned char` cannot be aligned. I agree the tricks of V1 only work for `unsigned char` and other non-aligned types. At no point do I advocate V1. I'm just pointing out its limitations. None of which have anything to do with aliasing because there is no de-referencing of the other alias. read or write. – Persixty Jan 12 '15 at 14:26
  • @2501 Your cited code doesn't de-reference `test`. It de-references `tlv`. It de-references TO the array aliased by `test` but that is something entirely different. Aliasing isn't about there being aliases it's about using them. That code doesn't dereference `test` so it's all good. – Persixty Jan 12 '15 at 14:44
  • When I was talking about `test` I meant and referenced `tlv` from your example( my mistake ). But the conclusion is still the same, undefined behavior in first example. About `test` from OP, he says it is dereferenced because, again: *Yet when I try to access the value array I get a segmentation fault, even when I try to access the first element of the value memory address (which should be the first element in the array after the length).* which is undefined b. I also showed that the struct pointer isn't defined to be capable of holding a char pointer. You keep ignoring my comments. – 2501 Jan 12 '15 at 14:46
  • He gets a seg fault because something else wrong. It has nothing to do with aliasing. I don't get a seg fault. Seg fault is (unusually for C) one of the less likely outcomes of an aliasing issue in this case. Poking values into `test` and expecting the compiler to necessarily spot it is an alias for `tlv` could be a potential problem. For example: `(&tlv->value)[0]=7;test[2]=10;printf("%u\n",(&tlv->value)[0]);` would be UB if the compiler ignores the aliasing because (&tlv->value)[0] and test[2] are aliases. – Persixty Jan 12 '15 at 14:58
  • Program not crashing on your computer has nothing to do with correct C code. I've shown in many different ways how the code is broken with references to the Standard. Thus future readers wanting to write correct C code, will hopefully see why is it so with the information provided. The only reason I'm being persistent in this case is that bugs like this can be silent for a long time, and I do not wish them to happen to anyone. We should end this discussion. Good luck( honest ) writing answers in the future. – 2501 Jan 12 '15 at 15:12
  • @2501 Agreed. Not crashing for me proves nothing. But crashing for the OP does even less to show it's aliasing! You have misunderstood the requirement to reference through the aliases to cause an issue. There's nothing wrong here. I never dereference the 'suspect' alias. It's all good. – Persixty Jan 12 '15 at 15:40
  • Just as a punctuation to the discussion let me *yet again* show you where you do dereference it in the first example: `tlv->length` and `tlv->value`, `tlv` points to `test`, clear strict aliasing violation. So it is undefined. Bye. – 2501 Jan 12 '15 at 15:56
  • @2501: I'm reading paragraph 7 of 6.5 in ISO/IEC 9899:TC2 : "— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union)," Notice that it talks about 'includes'. It's allowing this rather special case that we've lined a structure which contains members all of which are `unsigned int` over an array that is `unsigned int`. As I keep saying this isn't a good idea and any change to TLV will break it. But it isn't UB! – Persixty Jan 12 '15 at 18:16
  • @2501: I've posted a question asking for interpretations of that paragraph: http://stackoverflow.com/questions/27908626/aliasing-arrays-through-structs. I hope to see you there. In the car on my way home I was close to conceding and apologising. But I think that obscure paragraph actually permits this. I can't see what else that paragraph could be saying! If there is a better interpretation of that paragraph and I'm wrong I shall set a bounty on the question and award it to your reply! How's that? – Persixty Jan 12 '15 at 18:49