0

I'm working on an existing code base, so the answer "do it right, just use one type" doesn't work because they already didn't do it right, I just have to live with it.

I know that coercion works so that this works:

int a = 1;
long b = a;

even though int and long are different base types. However, this doesn't work:

int a;
long *b = &a;

because there's no auto conversion between "pointer to int" and "pointer to long".

If, rather than base types, I was working with classes, I could get it to work by providing a conversion. My question is this. Is there a way to provide conversions for base types (or rather pointers to base types) so I could start the process of converting my code to use a single 32-bit integral type? As it stands, I either need to do an "all or nothing" edit OR provide a boatload of explicit casts.

JaMiT
  • 14,422
  • 4
  • 15
  • 31
  • A pointer points to a *location*. You don't have a location to store a `long`. You have a location to store an `int`. And an `int` is not a `long`. – Silvio Mayolo Aug 06 '22 at 03:18
  • In my specific situation, an int and a long are both 4 byte signed integers. Same byte pattern, so the code is not seriously broken, and has been running fine for several decades. In practice, in my setting, there is no pragmatic difference between a pointer to an int and a pointer to a long. Windows, not Linux. long is a 4 byte signed type, not an 8 byte signed type. – Tim Williams Aug 06 '22 at 04:00
  • And yes, I'm well aware I'm talking about base types and user defined classes. I did that to show the sort of thing I wish was possible, where I can define conversions for classes. So, if I felt like doing it, I could define a class that wrapped a pointer to an int and class that wrapped a pointer to a long and provide appropriate conversion methods for those two classes and override the '*' operator and do a WHOLE LOT of text editing in my code and then I'd get the conversion I want. I just wondered if there was a way to do it with base types without the effort of wrapping them in classes. – Tim Williams Aug 06 '22 at 04:05
  • The problem I'm trying to solve is that I'm working on a very old and large Windows-based code base where the choice between long and int was irrelevant and largely the depended on the whim of the programmer. I'm trying to normalize to a single signed 32bit integral data type since that was the actual intent. But in practice, there are islands of code that use 1 type and islands of code that use the other and scattered explicit casts. My goal is to, eventually, have no need of casts, automatic or otherwise. – Tim Williams Aug 06 '22 at 04:17
  • 1
    @TimWilliams: "*In my specific situation, an int and a long are both 4 byte signed integers.*" Can't you just change them? If they are both effectively the same type, then just do a find/replace on one to turn it into the other. – Nicol Bolas Aug 06 '22 at 04:27
  • 2
    Related topic: [A: What is the strict aliasing rule?](https://stackoverflow.com/a/98702) Even when `int` and `long` have the same size, they are unrelated types as far as the strict aliasing rule goes. – JaMiT Aug 06 '22 at 04:29
  • @TimWilliams technically it's still UB. It violates the [strict aliasing rule](https://stackoverflow.com/q/98650/995714) which is the most serious issue and compilers can do anything and your code may break after optimization. So regardless of the size you need to do correct type punning via `memcpy` in older C++ or `std::bit_cast` in C++20 or newer. Size is irrelevant here, for example `float` and `int` may have the same size and casting `int*` to `float*` or vice versa and won't work – phuclv Aug 06 '22 at 04:29
  • Like I said, a large code base. The code churn to do that is not trivial. I'm aiming to get there, but the odds of making some sort of a mistake if I do it all at once are too great. Further, I'm dealing with scattered other typedefs such that at last count, I have something like a dozen different 32 bit signed types, although all resolve down to either int or long. – Tim Williams Aug 06 '22 at 04:30
  • 1
    *"I either need to do an 'all or nothing' edit"* -- what's the downside here? Do a *careful* text search-and-replace of the identifier "int" to "int32_t", and of the identidier "long" to "int32_t". Just be careful that you only convert full identifier. You could possibly even do this in pieces without making the current situation worse. – JaMiT Aug 06 '22 at 04:30
  • In theory, yes, int and long COULD be done with different byte patterns, but in practice, in Windows, they've been using the same byte pattern for 30+ years. I tried to head off this whole part of the discussion. Any answer of "don't do it that way" is useless; it has been done the way its been done for 20+ years. What I'm trying to do is clean it up, and incremental change, not wholesale search-and-replace, is mandatory. Honestly, any answer other than telling me how to provide an auto-conversion or an explanation of why it's not possible is wasted typing. I'm well aware of the issues. – Tim Williams Aug 06 '22 at 04:33
  • 1
    @Alexander that won't work when you pass pointers around. \@Tim as said, the only way to make this work correctly without replacing all types is to type pun properly with `memcpy` or `std::bit_cast`, period. There's no safe way to dereference pointers of another type, compilers can assume the values are different and reuse the old value in registers and your code will break. **Enable compiler warnings and you'll see that it shouts at you when using casts like that** – phuclv Aug 06 '22 at 04:37
  • Another problematic portion is the code base consumes libraries and system libraries from all over the place. And as is traditional in old Windows code bases, they also have mixed and matched int and long, so even if I clean my stuff up internally, I'm still going to need conversions. So even if I just change all the definitions of my types to a single base 32bit type, I end up needing to add lots of casts to interact with the rest of the system. And I need to do a lot of editing anyway, since not ALL of my code uses typedefed aliases for base types, a lot use the base types directly. – Tim Williams Aug 06 '22 at 04:37
  • Standard Windows APIs won't mix `int` and `long`. Only bad 3rd party library codes do. You can work around this and possibly trade a little bit of performance using gcc or a few other compilers like clang with `__attribute((__may_alias__))`, but that extension won't work in MSVC – phuclv Aug 06 '22 at 04:40
  • A lot of what I'm working on has it's origin in plain C code, so we didn't have fancy C++ converters. We CAREFULLY cast back and forth between signed and unsigned, also. It's quite possible to get that all working, and working well, but it's ugly and error prone and I'm trying to remove all that messiness. But it truly does have to be done little by little. If for no other reason than the others on my team would never sign off on such a big code change, even if I convinced myself I got all of the editing done correctly all at once. – Tim Williams Aug 06 '22 at 04:46
  • @TimWilliams *"they've been using the same byte pattern for 30+ years"* -- you've missed the point. Using the same byte patterns is why a `memcpy` approach could work, but it has no bearing on strict aliasing. Some compilers will let you do the aliasing you want to do, possibly after providing the right compiler option. (I did call strict aliasing a related topic rather than an answer.) However, if you decide to ignore the strict aliasing rules, then Murphy's Law says that it will come back to bite you at the worst possible time. – JaMiT Aug 06 '22 at 05:49
  • If you are willing to dive deep into the UB waters, just `#define long int`. Much easier that the cast, and just as legal (i.e. not at all). – n. m. could be an AI Aug 06 '22 at 06:07

1 Answers1

1

My question is this. Is there a way to provide conversions for base types (or rather pointers to base types) so I could start the process of converting my code to use a single 32-bit integral type?

No, there is no way to influence the set of implicit conversions between pointer types aside from inheritance relations between classes.


Even if you add a (long*) cast or reinterpret_cast<long*> everywhere, as mentioned in the comments, accessing the value through that pointer will be an aliasing violation and therefore cause undefined behavior. This is not related to the size, alignment or the representation of the int and long types. Rather compilers are explicitly allowed to make optimizations that assume that a long pointer can never be pointing to a int object and compilers will perform such optimizations that will break code in possibly very subtle ways.

Note that this is different for casts between e.g. signed int* and unsigned int*. Signed and unsigned variants of the same integral type are allowed to alias one another, so that either pointer type can be used to access the object. The compiler is not allowed to perform optimizations in this case that assume that pointers of the two types don't point to the same address at the same time.

GCC and Clang offer the -fno-strict-aliasing option to disable optiizations based on the aliasing rules (still assuming that the type do actually have the same size, alignment and compatible representations), but I don't know whether MSVC has a similar option. Some compilers may also explicitly allow additional types to alias that the standard does not allow to alias, but I would only rely on that if the compiler documents these clearly. I don't know whether MSVC makes any such guarantees for int and long.

user17732522
  • 53,019
  • 2
  • 56
  • 105
  • My environment is strictly limited Windows and MSVC. Whatever the language spec says about this aliasing violation (and yes, it IS an aliasing violation), in this environment, it's safe. Changing that behavior of MSVC would break an enormous amount of Windows itself. Idiomatically back in the 90s, we didn't differentiate between the two, and the amount of existing code pretty much guarantees that MSVC will keep that contract. – Tim Williams Aug 06 '22 at 05:53
  • @TimWilliams I don't have experience with MSVC. If you know that it does not perform optimizations based on the aliasing rules, then you can disregard that section of my answer. Unfortunately the answer to the actual question you asked is simply that it is not possible. – user17732522 Aug 06 '22 at 05:57
  • there's no option similar to `-fno-strict-aliasing` in MSVC. Previously there's the the [`/Ow` Assume Aliasing Across Function Calls](https://web.archive.org/web/20130726160200/http://msdn.microsoft.com/en-us/library/aa984741(v=vs.71).aspx) but [it's been been removed](https://learn.microsoft.com/en-us/cpp/porting/visual-cpp-what-s-new-2003-through-2015?view=msvc-170). MS says that we need to *Use the `noalias` or `restrict__declspec` modifiers to specify how the compiler does aliasing.* – phuclv Aug 06 '22 at 06:11
  • @phuclv But (weirdly) that seems to not be about type-based aliasing rules. Based on questions like https://stackoverflow.com/questions/37176461/does-visual-c-support-strict-aliasing it does indeed seem that MSVC allows arbitrary aliasing (with regards to types), assuming the answer is still up-to-date. – user17732522 Aug 06 '22 at 06:19
  • @TimWilliams MSVC definitely [does strict aliasing](https://stackoverflow.com/q/25347747/995714) and breaks many code because the rule has been there for so long, lasting [at least since C89](https://stackoverflow.com/q/6514663/995714). When MS began to implement stricter C and C++ compliance it broke many things including Windows SDK so MS had to fix many things and rewrite from scratch lots of other things. They also open source the STL so bugs could be found faster. Sometimes you have to do a large rewrite or refactor to keep future maintenance cost low – phuclv Aug 06 '22 at 06:24
  • @user17732522 indeed, they're less aggressive than gcc and clang but they do. I've had problems in the Windows SDK before. And when MS [rewrite MSVC's code optimizer](https://blogs.msdn.microsoft.com/vcblog/2016/05/04/new-code-optimizer/) they'll do even more optimizations and utilize more UB for optimization – phuclv Aug 06 '22 at 06:26
  • @phuclv The example you linked appears to be a bug per the answer. Aliasing any type with `unsigned char*` (which I assume `uchar*` to be) is always allowed. (Although technically pointer arithmetic on the result and the (partial) object representation assignment as used in the example is currently not defined by the standard. That is a defect though and I don't think any compiler misbehaves on that.) – user17732522 Aug 06 '22 at 06:29
  • @user17732522 yes, it's a bug. But the fact that they have an option for us to tell that aliasing doesn't happen means they do assume no alias in many cases – phuclv Aug 06 '22 at 06:32
  • @phulcv, I wrote code for MSFT in the 90's and mid 2000's. The amount of existing code that relies on the ability to cast back and forth between int and long and between int* and long* is amazing. Even if MSVC were to be able to do things that break that, there's no way it'd not be an overridable option. Take a look at the open-source for the ESE ISAM engine. It's riddled through with that idiom, and that's an active code base at MSFT. It's the data engine under Active Directory, so it's a critical part of WIndows Server. – Tim Williams Aug 06 '22 at 07:48