0

Enumerations in languages like e.g. Swift or Rust support a kind of hybrid "choice plus data" mechanism, such that I could define a type like:

enum SomeOption {
  None,
  Index(int),
  Key(string),
  Callback(fn),
}

Now if I were to implement this in C, my understanding is that something like this would not be valid:

typedef enum {
  is_callback_or_none,
  is_string,
  is_number
} my_choice;

typedef struct {
   my_choice value_type;
   void* value;
} my_option;

my_option x = {
  .value_type = is_number,
  .value = (void*)42
};
if (x.value_type == is_number) {
  int n = (int)x.value;
  // … use n…
}

I'm not sure what exactly I risk in doing this, but according to e.g. Can pointers store values and what is the use of void pointers? the only things I should store in a void* are actual addresses and NULL. [Aside: please turn a blind eye to the separate question of storing callback function pointers in a void* which I forgot was problematic when I made up this example.]

I suppose a more proper way to do this would be to use a union, e.g.:

typedef struct {
   my_choice value_type;
   union {
      int number_value;
      char* string_value;
      void* pointer_value;
   };
} my_option;

…which is probably nicer all around anyway. But I'm wondering specifically about the invalidity of the void* value version . What if I were (instead of the union solution) to simply substitute uintptr_t in place of the void*?

typedef struct {
   my_choice value_type;
   uintptr_t value;
} my_option;

Would storing either a pointer to a string/callback/null or a number within the uintptr_t value field of this struct be valid and [at least POSIX-]portable code? And if so, why is that okay, but not the seemingly equivalent void* value version?

natevw
  • 16,807
  • 8
  • 66
  • 90
  • `uintptr_t` is unsigned, `int` is signed. Converting between them could run into overflow problems. – Barmar Jul 14 '21 at 19:03
  • What problem are you really trying to solve? What's the gain from using `(void*)x.value` instead of `x.pointer_value`? – Barmar Jul 14 '21 at 19:05
  • @Barmar The problem is that I don't understand if the rules are different re. what I can do with a `uintptr_t`/`intptr_t` vs. a `void*`, and if so, why they would be different? – natevw Jul 14 '21 at 19:20
  • The only reason for using `uintptr_t` is because you want to do something numeric with the pointer value. See https://stackoverflow.com/questions/1845482/what-is-uintptr-t-data-type – Barmar Jul 14 '21 at 19:22
  • 2
    `uintptr_t` has the property that a `void *` can be converted to `uintptr_t`, then converted back to `void *`, and the result will compare equal to the original pointer. Note that officially neither `uintptr_t` nor `void *` can be used to store a function pointer. And the C specification says nothing about the range of numeric values that can be stored int `uintptr_t`. So converting integer types to `uintptr_t` isn't well defined either. So just use the `union` and be done with it. – user3386109 Jul 14 '21 at 19:28
  • Reminder that function pointers arent necessarily the same size/alignment as data pointers. – yhyrcanus Jul 14 '21 at 19:30
  • 2
    To put it another way, if you want an array that can hold anything, then a `union` consisting of a `void *`, a `void (*)(void)`, a `long long`, and an `unsigned long long` is the minimum that allows you to store any type, without violating any rules. – user3386109 Jul 14 '21 at 19:42
  • @user3386109 "if you want an array that can hold anything, then a union consisting of …" Boom! Thank you! That gets to the heart of my curiosity here. I'm guessing there's already Q&A threads here that cover that, otherwise I'll try to think of a productive way to ask that in a new thread. – natevw Jul 14 '21 at 22:42

2 Answers2

5

The problem is that I don't understand if the rules are different re. what I can do with a uintptr_t/intptr_t vs. a void*, and if so, why they would be different?

The rules are different because they're out at the edge, within a boundary (or a grey area) between what machines can actually do, and what people want to do, and what a language standard says they can do.

Now, yes, on a "conventional" architecture, pointers and ints are both just binary integers of some size, so clearly it's possible to mix'n'match between the two.

And, again yes, this is clearly something that some people find themselves wanting to do. You've got a big, heterogeneous array of things, and some of them are plain numbers, and some of them are data pointers, and maybe some of them are function pointers, and you've got some way of knowing which is which, and sometimes it really does seem tidy to store them in one big heterogeneous array. Or you've got a function with a parameter that sometimes wants to be an integer and sometimes wants to be a pointer, and you're fine with that, too. (Well, except for all the warnings you get from your compilers, and the lectures from language lawyers and SO regulars.)

But then there's the C Standard, which goes to some pains to distinguish between integers, and data pointers, and function pointers. There are architectures where these truly aren't interchangeable, and where it's a bad idea to try. And the C Standard has always tried to accommodate those architectures. (If you don't believe me, ask, because examples do exist.)

The C Standard could say that all pointers and all integers are more freely interchangeable. It sounds like that's what Swift and Rust have done. But it also sounds like Swift and Rust would not be implementable on those hypothetical "exotic" architectures.

These discussions get tricky because they're also at the intersection between language standards and programming practices. If you know you're always going to be using machines where integers and pointers are interchangeable, if you don't care about portability to the other, "exotic" architectures, you could just say so, and ignore the warnings, and move on -- unless your house style guide says that casts are forbidden, or your software development plan says that your code must be strictly conforming and compile without warnings. Then you might find yourself arguing with, or trying to change, the C Standard, just to get around your own SDP. (My advice: make sure that your style guide and your SDP have waiver mechanisms.)

Some day the C standard will probably get less "protective" (enabling?) of those exotic architectures. For example, I've heard it's been debated to drop the accommodation of one's complement and sign/magnitude machines, and to build the definition of two's complement into the next revision of the C Standard. But if that happens, or if other guarantees/accommodations change, it won't mean that C compilers and C programs for the exotic machines can't be written any more -- it will just mean that programmers for those machines will have to apply their own rules (like, "don't assign between int and void * and void (*)()") that aren't actually in the standard any more. (Or, equivalently, it means that strictly conforming code written for "normal" architectures won't automatically be portable to the exotic ones. Also, I guess, that the vendors of compilers for the exotic architectures won't be able to claim standards compliance any more.)

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • "It sounds like [all pointers and all integers are more freely interchangeable] is what Swift and Rust have done." — sorry if I gave that impression. I doubt that's quite what they do internally, since my example is just a basic one and they can basically store a whole separate struct for any given enum subvalue. That might make a good Q&A of its own since I've been curious as to the underlying implementation but haven't looked into it yet. – natevw Jul 14 '21 at 22:47
  • Nice answer, thanks! What is "SDP"? I know "Session Description Protocol" and I found "Semidefinite Programming" but neither of those makes sense in context. [BTW I did ask https://stackoverflow.com/questions/68386007/how-are-swift-enums-implemented-internally re. Rust/Swift structs…] – natevw Jul 14 '21 at 23:17
  • 1
    Sorry: SDP = Software Development Plan. – Steve Summit Jul 14 '21 at 23:50
2

Even if void * values are represented as numbers, that does not mean the compiler handles them as it does numbers.

A uintptr_t is a number; C 2018 7.20.1.4 1 says it designates an unsigned integer type. So it behaves like other unsigned integer types: You can put any number within its range and get the same number back (and you can do arithmetic with it). The paragraph further says any valid void * can be converted to uintptr_t and that converting it back will produce the original pointer (or something equivalent, such as a pointer to the same place but with a different representation). So you can store pointers in uintptr_t objects.

However, the C standard does not say there is a range of numbers you can put into void * and get them back. 6.3.2.3 5 says that when an integer is converted to a pointer type, the result is implementation-defined (except that converting a constant zero to void * yields a null pointer, per 6.3.2.3 3). 6.3.2.3 6 says when you convert a pointer to an integer, the result is implementation-defined. (7.20.1.4 overrides this when the number is a uintptr_t that came from a pointer originally.)

So, if you store a number in a void *, how do you know it will work? The C standard does not guarantee to you that it will work. You would need some documentation for the compiler that says it will work.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Thanks, appreciate all the spec references! Sounds like my non-union `uintptr_t` works [modulo fixing/avoiding it for function pointers, signed/unsigned behavior, etc.] simply because 7.20.1.4 says it will, but that's a recent shoe-in. (…and figuring out how to support that on exotic architectures is left as an exercise to the compiler/toolchain developers?) – natevw Jul 14 '21 at 23:22