0

So to clear out misunderstandings from the title (not sure how to ask the question in the title) I want to read from a file(char array), pass it as an void* so i can read undependable of datatype by incrementing the pointer. So here's an simple example of what I want to do in C code:

char input[] = "D\0\0Ckjh\0";
char* pointer = &input[0];       //lets say 0x00000010 
char type1 = *pointer;           //should be 'D'
pointer += sizeof(char);         //0x00000020
uint16_t value1 = *(uint16_t*)pointer; //should be 0
pointer += sizeof(uint16_t);     //0x00000040
char type2 = *pointer;           //should be 'C'
pointer += sizeof(char);         //0x00000050
uint32_t value2 = *(uint32_t*)pointer; //should be 1802135552

This is just for educational purpose, so I would just like to know if it is possible or if there is a way to achieve the same goal or something alike. Also the speed of this would be nice to know. Would it be faster to just keep the array and just make bitshifting on the chars as you read them or is this actually faster?

Edit: edit on the c code and changed void* to char*;

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
CLover32
  • 33
  • 9
  • 1
    What happened when you tried the code? That should be the first step. If it doesn't work then ask a question based on that. Note that void has no size, so incrementing a void pointer isn't really something that is possible. On the other hand char is defined as size 1. This also depends on the architecture, memory management unit etc etc so is quite broad topic. – Sami Kuhmonen Aug 23 '17 at 09:51
  • You have many invalid operation. (`*pointer` : dereferencing `void *` pointer, `pointer++` and `pointer += 2` : pointer of type `void *` used in arithmetic) – BLUEPIXY Aug 23 '17 at 09:52
  • 2
    If you want a `char` array, use `char*` throughout. There's no need to cast it to `void*` and back. – n. m. could be an AI Aug 23 '17 at 09:52
  • 1
    Have you tried the code, it should not work. You can't dereference a void pointer nor using arithmetical operator on that. – Daniel Tran Aug 23 '17 at 09:54
  • 2
    @DanielTran surprisingly, with gcc extension enabled, it will. – Sourav Ghosh Aug 23 '17 at 09:55

3 Answers3

2

This is wrong in two ways:

  1. void is an incomplete type that cannot be completed. An incomplete type is a type without a known size. In order to do pointer arithmetics, the size must be known. The same is true for dereferencing a pointer. Some compilers attribute the size of a char to void, but that's an extension you should never rely on. Incrementing a pointer to void is wrong and can't work.

  2. What you have is an array of char. Accessing this array through a pointer of a different type violates strict aliasing, you're not allowed to do that.

    That's actually not what your current code does -- looking at this line:

    uint32_t value2 = (int)*pointer; //should be 1802135552
    

    You're just converting the single byte (assuming your pointer points to char, see my first point) to an uint32_t. What you probably meant is

    uint32_t value2 = *(uint32_t *)pointer; //should be 1802135552
    

    which might do what you expect, but is technically undefined behavior.

The relevant reference for this second point is e.g. in §6.5 p7 in N1570, the latest draft for C11:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

The reasoning for this very strict rule is for example that it enables compilers to do optimizations based on the assumption that two pointers of different types (except char *) can never alias. Other reasons include alignment restrictions on some platforms.

  • I don't think OP is casting / assigning the pointer to any different type. – Sourav Ghosh Aug 23 '17 at 09:54
  • So it is not possible to read an char array as a different type? – CLover32 Aug 23 '17 at 09:55
  • @SouravGhosh indeed, but given the string contains two `0` bytes, that's probably what he **meant** to do :o –  Aug 23 '17 at 09:55
  • @CLover32 it's possible but forbidden. Btw, what you probably *meant* to do was something like `uint16_t value1 = *((uint16_t *)pointer);`. –  Aug 23 '17 at 09:56
  • @FelixPalmen you're right, im gonna fix it fast. And may i ask why it is forbidden? – CLover32 Aug 23 '17 at 09:58
  • `which might do what you expect, but is technically undefined behaviour.` Technically is not - formally yes. @Felix Palmen should provide any example of the actual UB in any practical situation - but not intentionly the abuse of use of types and pointers. So example like char `*x="hello world"; void *y = x; float z = *(float *)y;` is not allowed. As we agree in the another topic I am too dumb to understand more abstract implications – 0___________ Aug 23 '17 at 10:09
  • @PeterJ_01 go do your own research. There are plenty of examples how violating strict aliasing could go wrong (and actually **went** wrong). –  Aug 23 '17 at 10:12
  • @FelixPalmen So just to get it straight. I can not read lets say an int* from a random index in a char array, because i cant change where a void*/char* pointer is pointing. But i can still read an char[4] as an uint32_t? (i'm sorry but english isn't my best language) – CLover32 Aug 23 '17 at 10:53
  • @CLover32 I don't understand your comment. Of course you can increment a pointer to point somewhere in the middle of an array (even one past the end is allowed). You just can't increment a `void *`, as `void` doesn't have a size. And you can't access an object through a pointer of a different type, except `char *`. –  Aug 23 '17 at 11:04
  • Okay, i think i got it. I can cast every integer* as an char* and void* but not the other way around. So is it possible to do make an uint32_t* and cast it as an char* and then use it as an char array of size 4? – CLover32 Aug 23 '17 at 11:12
  • @CLover32 yes, that's perfectly fine. And the `memcpy()` method suggested in another answer (where you just copy the bytes of the representation to a variable of your target type) is fine as well, **as long as** you can make sure not to hit *trap representations*. –  Aug 23 '17 at 11:13
  • Thanks a lot. if i use memcpy() in the above example i could do it like memcpy(value1, &input[1], 2)? and memcpy(value2, &input[4], 4)? – CLover32 Aug 23 '17 at 11:21
1

UPDATE:

in the updated code in the question

   uint16_t value1 = *(uint16_t*)pointer;

exactly violates strict aliasing. It's invalid code.

For more details, read the rest of the answer.


Initial version:

Technically, you are not allowed to dereference a void pointer in first place.

Quoting C11, chapter §6.5.3.2

[...] If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. [...]

but, a void is a forever-incomplete type, so the storage size is not known, hence the dereference is not possible.

A gcc extension allows you to dereference the void pointer and perform arithmatic operation on them, considering it as alias for a char pointer, but better, do not reply on this. Please cast the pointer to either a character type or the actual type (or compatible) and then, go ahead with dereference.

That said, if you cast the pointer itself to some other type than a character type or an incompatible type with the original pointer, you'll violate strict aliasing rule.

As mentioned in chapter §6.5,

An object shall have its stored value accessed only by an lvalue expression that has one of the following types

— a type compatible with the effective type of the object,

— a qualified version of a type compatible with the effective type of the object,

— a type that is the signed or unsigned type corresponding to the effective type of the object,

— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

— a character type.

and, chapter §6.3.2.3

[....] When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
1

Even if you fix your code to cast pointer to correct type (like int *) before dereferencing it, you might have problems with alignment. For example on some architectures you simply can not read an 4-byte int if it is not aligned to 4-byte word boundary.

A solution which would definitely work is to use something like this:

int result;
memcpy(&result, pointer, sizeof(result));
aragaer
  • 17,238
  • 6
  • 47
  • 49
  • might want to add that this **can** copy a trap representation, so you have to be sure about the bit pattern `pointer` points to. Otherwise good solution. –  Aug 23 '17 at 10:24
  • @aragaer But this doesn't work on parts of the char array? Like from index 1-5. Doesnt it copy from the start? – CLover32 Aug 23 '17 at 11:03
  • If you update `pointer` so that it points to correct byte it will work. Otherwise you can copy with any offset you want - `memcpy(&result, pointer+1, sizeof(result));` will not be much different. – aragaer Aug 23 '17 at 11:06