0

I'm actually not sure how the printf statement is working.

char *po;    
int y=9999;    
po = &y;    
printf("\n%d", *(int *)po);

I first create a char pointer assign an integer address to it, then print it back after typecasting. My guess is that (int *)po casts po to integer type, then *(int *)po retrieves the value pointed by this integer type pointer. Not sure though. Can someone explain it better?

What if po was still a char * but y was some struct with multiple different members, int, float, char etc.?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
J.DOE
  • 291
  • 2
  • 7
  • *What if po was still a `char *` but y was some struct with multiple different members, `int`, `float`, `char` etc.?* If you start with a `float` and then try to access it via an `int *`, that would be a [strict aliasing](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule) violation. This question is really close to being a duplicate of that. – Andrew Henle Jul 16 '18 at 14:55
  • Not C++. Please remove redundant tag. – Martin James Jul 16 '18 at 14:57
  • 2
    `(int *)po` does a cast of the variable `po` to a pointer to an `int` and not to an `int`. The asterisk (*) indicates pointer. So what `*(int *)po` does is to cast the variable `po` to a pointer to an `int` and then dereferences the pointer to get to the actual `int` value pointed to by `(int *)po`. – Richard Chambers Jul 16 '18 at 15:08
  • 1
    (char *) is specifically exempt from strict aliasing. – Max Jul 16 '18 at 15:14
  • 5
    Is there a reason you're casting pointer types back and forth like this? Sometimes there's a good reason to, and sometimes it's legal, but other times, it just makes things unnecessarily confusing. If you have an `int`, why not use an `int *` to point to it? – Steve Summit Jul 16 '18 at 15:23

4 Answers4

3

Let's go line by line.

char *po;   //po is a pointer to character.
int y=9999; //y is an int initialised to 9999.  
po = &y;    //po is assigned to point to the first character (i.e. byte) of y.
printf("\n%d", *(int *)po); //po is cast back to a pointer to int, dereferenced and printed.

There's an explicit rule in C that you can cast any pointer to a data type to a pointer to character type (signed char, unsigned char or the one-of-those but unspecified char). So this is guaranteed OK. Be careful though. C doesn't specify if the platform is big-endian or little-endian so there's no way of knowing upfront which byte of the int you're pointing to - most, least significant or other (on some obscure platforms).

That rule casting any data-type pointer to a char-pointer applies to everything struct, double, float - the whole data-type shooting match.

Now if you cast it back to an int and try and print it, you could be in trouble. If you're dealing with a struct and its first member is an int you're still fine. Otherwise you're straight into Undefined Behaviour. On some machines assuming the thing you're pointing to is at least sizeof(int) long, you'll get a value that is whatever is stored interpreted as an int but on some machines the address may not be properly aligned. If the object is not large enough (<sizeof(int)) and you may get some protection fault or even the value of the 'next aligned int up'.

On some very obscure architectures, you could hit a 'trap representation' and also abort.

Persixty
  • 8,165
  • 2
  • 13
  • 35
2

A pointer is a number which indicates the beginning of a memory section. For example, you can write

void* ptr = 0xaa3156bc;

Now, you have this address where you want to read data, but how do interpret what's there?

Well, you can tell the compiler to either read 4 bytes as int by converting to int, or 8 bytes and convert to a double precision number, or a byte by converting to a char.

int vali = *(int*)ptr;
double vals = *(double*)ptr; // valid operation in C but can fail/have unexpected consequences
char valc = *(char*)ptr;
char *valstr = (char*)ptr;

Normally, you can just read/write to arbitrary memory addresses, so you will need the address of some valid memory: an existing variable, something what was allocated with malloc or new (C++). This restriction is true on modern processors, but on Arduino for example, you should not get an access violation reading/writing arbitrary location within address space.

float x = 10.0f;
void *ptr = (void*)&x;
int*ptrf = (int*)(void*)&x;
MyStructWithManyFields *ptrstruct = (MyStructWithManyFields *)(void*)&x;
// you can still do conversions:
float valf = *(float*)ptrstruct; // valf = 10.0f
int vali = *(int*)ptrstruct;  // vali=1092616192

You can use any as intermediate pointer because is just a hint to the compiler. Type of pointer is only important for the following reasons: hint to the developer, pointer arithmetic, pointer reference:

// given
void *pvoid = 0x0000aa10; // just as an example, do not do this in practice.
int *pi = (int*)pvoid;
char *pc = (char*)pvoid;
StructOfSize9bytes *ps = (StructOfSize9bytes *)pvoid;
// then
pvoid++; // compiler error;
pi++; // pi= 0x0000aa14;
pc++; // pc= 0x0000aa11;
ps++; // ps= 0x00aaaa19;
AndreiM
  • 815
  • 9
  • 17
  • 2
    `void* ptr = 0xaa3156bc;` is a problem as `(void*)0xaa3156bc` is not known to be a legitimate address: (UB #1). `double vals = *(double*)ptr;` and others can readily fail (UB#2) as even if `ptr` is a legal address, it may not be properly aligned for the object type (`double`). – chux - Reinstate Monica Jul 16 '18 at 16:00
  • I changed the text a bit, should be better now. While I agree it's not "legitimate", it is a valid C operation (for example processors running in real mode). In my experience, C beginners can grasp the notion easier when they understand that a pointer is just a number. – AndreiM Jul 18 '18 at 09:53
1

The "type" of the pointer is not tracked in memory by the compiler. You can cast that pointer to any kind of pointer you want as an input and it doesn't make any difference to the compiler when processing the call to printf, nor does printf receive any type information that would allow it to fail safely if you pass in the "wrong" type.

There is a C facility called varargs, which is kind of an agreement between the compiler, the ABI, and the C standard library. You can read about it here: https://www.gnu.org/software/libc/manual/html_node/Variadic-Functions.html

In short, printf uses the varargs facility to iterate over the arguments passed into the function. At the same time, it evaluates the format string to interpret what kind of arguments should have been passed in. It then casts the varargs results as needed to format the data for output.

That's why mismatches between printf format strings and argument lists is such a huge security hole. Some compilers even go so far as to interpret the format string and warn you about mismatches between the parameters and the format.

Christopher
  • 8,815
  • 2
  • 32
  • 41
  • I used to think that for the compiler to interpret the format string and warn about mismatches was a pretty crazy thing to do, a "far way to go". But today it's clear that it's a necessity. Anyone not using such a compiler is setting themselves up for uncaught bugs, and (many of not most would say) behaving pretty irresponsibly, or at least unprofessionally. – Steve Summit Jul 16 '18 at 15:26
  • 1
    "You can cast that pointer to any kind of pointer you want as an input and it doesn't make any difference to printf." No. `printf("%p"...` expects a `void *` and likely tolerates a character pointer. Pointers of other types, especially pointers to functions are not specified to be acceptable to `printf("%p"...`. Best to use an explicit cast: `printf("%p", (void*) some_object_pointer)` – chux - Reinstate Monica Jul 16 '18 at 15:47
  • @chux Certainly printf cares what the pointer actually points to, but printf has no way to know what "type" you actually passed in. That's all I mean. – Christopher Jul 16 '18 at 17:34
1

I first create a char pointer assign an integer address

No, you create a pointer-to-char and assign the address of an integer to it.

An integer address isn't a thing, and integers don't live in some magically different address space to characters.

An pointer-to-integer is an address (which is essentially typeless and could point to any data type) coupled to a type (the thing stored at this address is an integer).

When you cast your int* to a char*, the address is unchanged. You're just choosing to lie to the compiler about the type stored at that address, for reasons best known to yourself.

My guess is that (int *)po casts po to integer type

When you cast po back to int*, the address is still unchanged, and it's still the address of your original integer. You just admitted to the compiler that it isn't "really" a char stored there.

Casting to "integer type" would mean (int)po, which isn't what you did. You seem to be confusing the type of the pointer with the type of the thing it points at.

then *(int *)po retrieves the value pointed by this integer type pointer. Not sure though.

Yes, that's correct. It's just the same as dereferencing any other pointer-to-integer, you get the value of the integer it points to. You could trivially split the expression up as

int *pi = (int*)po;
int i = *pi;

and then print that. You can also print the address of a pointer with to confirm everything is what you expect (or just inspect these values in a debugger)

char *po;
int *pi;
int i;
int y=9999;    
po = (char *)&y;   
pi = (int *)po;
i = *pi;

printf("y=%d\n &y=%p\n po=%p\n pi=%p\n i=%d\n &i=%p\n",
       y, (void*)&y, (void*)po, (void*)pi, i, (void*)&i);

What if ... y was some struct with multiple different members ...

You're just asking in general what happens when you cast a pointer-to-X to a pointer-to-Y and back to pointer-to-X?

It's fine. You're just telling stories about the pointed-to type, but the address never changes.

If you want to access your X through a pointer-to-Y, you need to read the strict aliasing rules

Useless
  • 64,155
  • 6
  • 88
  • 132
  • 2
    "When you cast your int* to a char*, the address is unchanged" --> The cast creates a pointer to the same memory. Yet pointers to `int` and pointers to `char` may encode differently and need not even be the same width. Both of the features are uncommon - yet supported by C. – chux - Reinstate Monica Jul 16 '18 at 15:44
  • I know that, for example, data and function pointers may be different. But the aliasing rules require the `int*` and `char*` to point at the same memory location, so I'd argue they're both the same address even if theyr'e differently encoded. Certainly any transformation must be reversible. – Useless Jul 16 '18 at 16:10
  • Fair enough about _same memory location_. It is only a minor concern. Your last point about "cast a pointer-to-X to a pointer-to-Y and back to pointer-to-X" does over-generalize beyond casting through `void*/char *` though and is not quite fine. Overall, best answer so far. – chux - Reinstate Monica Jul 16 '18 at 16:16
  • 1
    @Useless: It is false that any transformation must be reversible. Per C 2011 (N1570) 6.3.2.3 7, while a pointer to an object type may be converted to a pointer to a different object type, “If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined.” If you are going to teach C rules to people, please follow the rules. – Eric Postpischil Jul 16 '18 at 16:48
  • 1
    There are a number of problematic statements in this answer. Re “No, you create a pointer-to-char and assign the address of an integer to it”: The OP’s statement to which this responds may be crudely worded, but English is permissive enough that it has an interpretation which is true. “ Re “an address (which is essentially typeless and could point to any data type) ”: This is not generally true in C; the standard is written to permit architectures in which it is not true. Re “…integers don't live in some magically different address space to characters”: This is implementation-dependent. – Eric Postpischil Jul 16 '18 at 16:54
  • 1
    Re “When you cast your int* to a char*, the address is unchanged”: This is implementation-dependent. Re “You're just choosing to lie to the compiler about the type stored at that address”: This is false. Per C 2011 (N1570) 6.2.6, there are bytes at the `char` address converted from a pointer to an `int`, and programs may access them through a pointer to `char`. – Eric Postpischil Jul 16 '18 at 16:54