0

From these question: Casting one struct pointer to another - C, I would like to know, if it is possible to use a member of a "general" struct typed to a "specific" struct:

#include <stdio.h>
#include <stdlib.h>

enum type_e { CONS, ATOM, FUNC, LAMBDA };

typedef struct {
    enum type_e type;
} object;

typedef struct {
    enum type_e type;
    char *expression;
} lambda_object;

typedef struct {
    enum type_e type;
    object *car, *bus;
    int value;
} cons_object;


object *traverse(object *o){
    if (o->type == CONS){
        cons_object *cons = (cons_object*)o;
        traverse(cons->car);
        traverse(cons->bus);
        return (object*)cons;
    } else if (o->type == LAMBDA) {
        lambda_object *lam = (lambda_object*)o;
        return (object*)lam;    
    }
    return 0;
}

int main(){
    lambda_object l = {LAMBDA, "value to print\n"};
    object *p = traverse((object*)&l);
    printf("sizeof(object):%lu\nsizeof(lambda_object):%lu\n",sizeof(object), sizeof(lambda_object));
    printf("%s\n",*(p+4));

}

Which emits no error, just command terminated so I have no idea what gone wrong, but suspect I tried to deference wrong address *(p+4), but I know, there is a pointer to my string. From definition of lambda_object, after enum (which is 4 bytes long, just as int), there is my pointer. So I should not be dereferencing wrong address, but still I cannot. Why?

output:

a.c: In function ‘main’:
a.c:46:11: warning: format ‘%s’ expects argument of type ‘char *’, but argument 2 has type ‘object’ {aka ‘struct <anonymous>’} [-Wformat=]
  printf("%s\n",*(p+4));
          ~^    ~~~~~~

Press ENTER or type command to continue
sizeof(object):4
sizeof(lambda_object):16

Command terminated

EDIT: I have tried (char*)p[4], still termination

milanHrabos
  • 2,010
  • 3
  • 11
  • 45
  • This what `union` is for. – Scott Hunter Aug 05 '20 at 12:41
  • 3
    This is not how pointer arithmetics works. `p+4` is pointing 4 elements of type `struct object` after `p`. This is the very same as `p[4]`. – Gerhardh Aug 05 '20 at 12:42
  • If `sizeof(object)` is not 1 (which is probably the case as it contains an enum which is probably stored as an `int`) then `(p+4)` is a pointer 4 bytes from p. If that's what you want use `((char)p + 4)`. But it's ugly programming prone to mistakes. Better is to use `union` to define structures that take up the same space. – Brecht Sanders Aug 05 '20 at 12:43
  • @JonathanLeffler I did, didn't helped – milanHrabos Aug 05 '20 at 12:48
  • 1
    @milanHrabos with `p[4]` you are doing the same mistake as before! `[]` binds more tightly (has higher precedence) than cast. What stops you from using something like `((lambda_object*)p)->expression`? – th33lf Aug 05 '20 at 12:51
  • The `p[4]` is accessing data out of bounds. Crashes are permissible because of that. The suggestion to use `((char *)p + 4)` reduces the risk of a crash, but is unreliable at best (and would make maintenance of the code unnecessarily hard). The expression is misguided. I'm not sure why you think it does what you want, but it does not do anything sensible. Generic pointers are hard. – Jonathan Leffler Aug 05 '20 at 12:58
  • @th33lf well this one does it right (the only one), but I would like to also be able to get at the address (via arithmetic) and dereference it. The access operator `.` (or after dereferencing `->`) is counting just the offset in the struct. I want to do the same – milanHrabos Aug 05 '20 at 12:59
  • @JonathanLeffler, it output some garbage, so I suspect i won't get at the right address, where the string begin. I do not know why, it should work (tried `printf("%s\n", (char*)p+4);` – milanHrabos Aug 05 '20 at 13:01
  • 1
    It's unclear to me what you are trying to achieve. Perhaps if you described that someone could come up with a better approach. Here you are playing with fire and making assumptions that don't hold. Anyway - just for fun - try: `printf("%s\n", *(char**)((char*)p + ((unsigned long long)&l.expression - (unsigned long long)&l)));` – Support Ukraine Aug 05 '20 at 13:35
  • What's wrong with using `printf("%s\n", ((lamba_object*)p)->expression);`? If you cannot use that then `printf("%s\n", *(char**)((char*)p + offsetof(lamba_object, expression)));` should work. – Ian Abbott Aug 05 '20 at 14:50

1 Answers1

0

First of all, like many others pointed out in the comments, this is not the ideal way to do whatever it is that you are trying to achieve. The easiest and most portable way would be to use something like ((lambda_object*)p)->expression.

As for why your code behaves as it does, perhaps I can provide an explanation.

Before that, here is your program, 'fixed' to print the stored string exactly the way you wanted it to.

#include <stdio.h>
#include <stdlib.h>

enum type_e { CONS, ATOM, FUNC, LAMBDA };

typedef struct {
    enum type_e type;
} object;

typedef struct {
    enum type_e type;
    char *expression;
} lambda_object;

typedef struct {
    enum type_e type;
    object *car, *bus;
    int value;
} cons_object;


object *traverse(object *o){
    if (o->type == CONS){
        cons_object *cons = (cons_object*)o;
        traverse(cons->car);
        traverse(cons->bus);
        return (object*)cons;
    } else if (o->type == LAMBDA) {
        lambda_object *lam = (lambda_object*)o;
        return (object*)lam;    
    }
    return 0;
}

int main(){
    lambda_object l = {LAMBDA, "value to print\n"};
    object *p = traverse((object*)&l);
    printf("sizeof(object):%lu\nsizeof(lambda_object):%lu\n",sizeof(object), sizeof(lambda_object));

    printf("%s\n",*((char**)((char*)p+8))); // Note the weird typecasts and p + 8 instead of 4
}

Coming to the reason for this, assuming a 64-bit machine, your lambda_object struct would look like this in memory:

| Bytes 0 to 3 | Bytes 4 to 7 | Bytes 8 to 16                |
--------------------------------------------------------------
| type         | padding      | expression                   |
--------------------------------------------------------------

What you should note here is that expression is a pointer to the string and not the string itself. So even though type is only 4-bytes long, expression starts only at p + 8 and not p + 4 as one might expect. The bytes from 4 to 7 will simply be left empty as padding. This is because a 64-bit pointer has to start at a 64-bit aligned address.

But then ((char *)p + 8) should work right? Unfortunately not! We started with p as a pointer to a lambda_object. We have typecasted p to a char pointer to reach the right offset within this struct but this means that you are telling the compiler that there is a character at the location p + 8, when in fact what is there is a pointer to a character. If you pass this to printf() it tries to print this pointer as a string, resulting in gibberish.

What you should do now is de-reference the pointer p + 8 to fetch the pointer expression by telling the compiler to treat p + 8 as a pointer to a pointer. This is achieved using a typecast to (char**). Now you can de-reference this once to get a char pointer and finally pass this on to printf().

th33lf
  • 2,177
  • 11
  • 15
  • 1
    While `+ 8` is probably correct for most (if not all) 64 bit systems, it's probably wrong on 32 bit systems. The only way to do it, is to calculate the offset of `expression` instead of using a hard coded value. For this see the comments under the question – Support Ukraine Aug 05 '20 at 15:04
  • @4386427 Agree with both comments! My idea was to explain the difference in behaviour between what the OP expects and what he gets. I guess all of us unanimously agree that the best approach is to simply use the structure to properly compute the offset. – th33lf Aug 05 '20 at 15:08
  • @th33lf I just could not get, why to use double-pointer when you immediately dereference it. Why not just use `char*` pointer and adjust its address to the beginning of the string. Then I realized, the the "string pointer" (`char*)` is in `text` segment (or read-only), so it is fixed, and the only option I have (to use arithmetic) is to use another pointer to find that fixed-char-pointer (and thus doube pointer, and then dereference it). The only thing I did not understand, is Why do I need to use 2 pointers instead of one, the `segment` is the answer – milanHrabos Aug 05 '20 at 15:42
  • @milanHrabos Nope. The reason why we need a double pointer is that if you dereference a `char*`, you get a char which is one byte. Only one byte would be read from memory when you dereference it. But what you need to pass to `printf` is an 8 byte pointer which you only get by dereferencing a `char**`. You could also get away with typecasting to an `unsigned long*` (single pointer) instead of `char**` here and it would still work, although the compiler would complain. – th33lf Aug 05 '20 at 20:06
  • @th33lf then, why to even dereference? I can just *calcuate* the (char*) address, no need for dereferencing and thus no need for double pointer – milanHrabos Aug 05 '20 at 22:26
  • @milanHrabos That would have been true if the string was stored directly at location p + 8. But there is no string at that location. There is only `expression` which is a *pointer* to a string. To load that pointer into memory so that we can pass it to printf, one dereference is necessary. Another way to think about it is that `p` is a pointer to a struct. Dereference it and you get a struct. Only then can you access its members. – th33lf Aug 06 '20 at 08:24
  • @th33lf you are right. After seeing it in assembly, there are 2 pointer - one for struct, another for the string pointer. I just I can load `lea .LC0(%rip), %rdi` directly into `puts`, bu the the string `.LC0`is first put on the stack, as part the struct member. So in the end, There are 2 pointers and thus you need double pointer. So you are right – milanHrabos Aug 06 '20 at 12:05