1

Question:

Can I have two pointers of different types (uint32_t * and char *) pointing to the very same address?


Here is why I want to have this:

I want to convert UTF-8 to UTF-32 and vice versa in C.

Lets say, I have a variable of type uint32_t that contains one UTF-32 encoded unicode character. And I already know that it needs 4 byte when encoded in UTF-8. It's binary representation is this:

00000000000aaabbbbbbccccccdddddd

a, b, c and d are 4 different ranges where each bit can be 0 or 1. With clever bitwise &, | and << operations I can rearrange these bits so that at the end there is this new distribution:

00000aaa00bbbbbb00cccccc00dddddd

And then I can flip some bits (using | again), to get this

11110aaa10bbbbbb10cccccc10dddddd

When I split this into 4 subsequent char variables in an array I have this:

11110aaa  10bbbbbb  10cccccc  10dddddd

which is exactly the UTF-8 encoding of the same unicode character.

So, the very same 4 byte in memory shall be one single uint32_t variable and at the same time an array of 4 char variables:

So, I want to have this:

uint32_t *utf32;
char utf8[4];

  • *utf32 is a pointer that points to a single 4 bytes long uint32_t variable.
  • utf8 is a pointer to an array of 4 char elements, each 1 byte long.

And I want that both pointers point to the very same address. So I can write a utf32 encoded character into the variable utf32, transform it in place, and then read the result form the array utf32. Is this possible? If so: How can I do it?

(I used this technique very often when I was coding in COBOL in the previous millennium, because in COBOL it's easy to overload the same region in the memory with many different definitions. But I don't know how to do it in C.)


I have found a lot of questions dealing with 2 pointers pointing to the same address, but in these questions the pointers have always the same type. And some other questions are about why you get an error if a pointer defined with a certain type points to an address that was defined with another type. But I didn't find anything about two pointers of different types sharing the same address.

Hubert Schölnast
  • 8,341
  • 9
  • 39
  • 76
  • 2
    "*So, the very same 4 byte in memory shall be one single `uint32_t` variable and at the same time an array of 4 `char` variables*" - while that is certainly *possible* (by using a `union`, or 2 typed pointers to the same memory, as you ask), I wouldn't suggest doing that. `uint32_t` has endianess, the order of its bytes may not match up with the order you need for the `char[]`. I would use a separate `char[]` and shift bits from the `uint32_t` into the `char[]` as needed regardless of endian. Also, because UTF-8 is variable-length anyway, not all `uint32_t` values will fill a `char[4]`. – Remy Lebeau Dec 22 '21 at 21:49
  • 1
    You can do this — but it helps a lot that one of your two pointers will be `char *`. If you had two pointers neither of which was `char *` — say, `int *` and `float *` — you'd have to worry about [*strict aliasing*](https://stackoverflow.com/questions/98650). But accessing via a `char` type is an explicit exception to that rule. – Steve Summit Dec 22 '21 at 21:53

2 Answers2

4

Can I have two pointers of different types (uint32_t * and char *) pointing to the very same address?

Yes, you can.

union U {
  uint32_t ui32;
  char c[4];
};

union U u;
u.ui32 = ...

uint32_t *pi = &u.ui32;
char *cp = u.c;

assert(pi == cp);

There are some C language rules which you'll violate IF you use the resulting char* to do something other than copying the data in or out, but the "two diffierent pointer types pointing to the same address" is not a problem in itself.

You could also simply cast the address to desired type:

uint32_t x;
uint32_t *ip = &x;
char *cp = (char*)&x;

assert(ip == cp);
Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • While `union` works, is not necessary. If stored memory is of type `uint32_t` (or compatible) simply casting its pointer to `char*` will do. – user694733 Dec 23 '21 at 10:59
1

Yes, two pointers of different types can point to the same address.

Let's say that somewhere in your memory is this utf32 and you know where that is so I will refer to this as address.

So if you'd want to treat these 4 bytes like a uint32 you could do this:

uint32_t* utf32 = address;

And you can just as easily treat is as a char array:

char* utf8 = address;

If you then want to access a char you just do:

utf8[index]
jstnklnr
  • 84
  • 4