33

In Visual Studio, it seems like pointer to member variables are 32 bit signed integers behind the scenes (even in 64 bit mode), and a null-pointer is -1 in that context. So if I have a class like:

#include <iostream>
#include <cstdint>

struct Foo
{
    char arr1[INT_MAX];
    char arr2[INT_MAX];
    char ch1;
    char ch2;
};


int main()
{
    auto p = &Foo::ch2;
    std::cout << (p?"Not null":"null") << '\n';
}

It compiles, and prints "null". So, am I causing some kind of undefined behavior, or was the compiler supposed to reject this code and this is a bug in the compiler?

Edit:

It appears that I can keep the "2 INT_MAX arrays plus 2 chars" pattern and only in that case the compiler allows me to add as many members as I wish and the second character is always considered to be null. See demo. If I changed the pattern slightly (like 1 or 3 chars instead of 2 at some point) it complains that the class is too large.

Aykhan Hagverdili
  • 28,141
  • 6
  • 41
  • 93
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/222338/discussion-on-question-by-ayxan-haqverdili-why-is-this-pointer-null). – Samuel Liew Oct 01 '20 at 02:20
  • Why shouldn't it? If you don't give your reasoning for it to be addressed, you're just asking for the documenation to be rewritten. – philipxy Oct 06 '20 at 18:37
  • @philipxy what do you mean? A pointer that isn't set to null shouldn't be null. That's the reason. – Aykhan Hagverdili Oct 06 '20 at 19:50
  • 1
    I just explained why you need to say why you expect what you expect. Also that comment reasoning is too vague & incomplete to address. Also please clarify via edits, not comments. – philipxy Oct 06 '20 at 19:58
  • @philipxy I don't think there's anything to clarify as no one else had any problem understanding why a pointer not set to null shouldn't be null. – Aykhan Hagverdili Oct 06 '20 at 21:48
  • Whether people guess correctly at your misconception correctly is irrelevant to what makes a good question. – philipxy Oct 06 '20 at 21:51
  • Your pointer has a garbage value, i think it's a NULL just for a randomness reason. Try to create a Foo struct instance and look up if the pointer gives you a NULL value anymore. I don't think so. Simply your char is initialized to a NULL or non-NULL value randomly because you don't have a struct instance, but only a declaration. And you're right when you're saying that pointers are not pointing to NULL values by default. –  Oct 07 '20 at 09:22
  • 2
    @EdoardoRosso why do you think it has garbage value? I think you should read [Pointer to class data member “::*”](https://stackoverflow.com/q/670734/10147399) – Aykhan Hagverdili Oct 07 '20 at 11:16
  • 3
    @EdoardoRosso no, you're wrong. struct *is* a class. – Aykhan Hagverdili Oct 07 '20 at 11:51
  • You're anyway poiting to a non-existent variable. I use to write in C, now i looked up and i found out that you're right, struct is a "special" type of class in C++. But let it be whatever it's, you're poiting a declared var that has not been allocated, the pointer has a garbage value. –  Oct 07 '20 at 12:16
  • 7
    @EdoardoRosso No, it's not garbage. A member pointer isn't a pointer at all. It does not require an object to exist. – Aykhan Hagverdili Oct 07 '20 at 12:20
  • It requires an object to exists if you want p to point to a memory that makes sense. You're printing value of p at runtime when it points to, literally, NOTHING. –  Oct 07 '20 at 12:49
  • 1
    @AyxanHaqverdili: Could you grab the value of `offsetof` for all the members of the struct? That might be helpful. – Bill Lynch Oct 07 '20 at 14:19
  • It's obviously an implementation limits problem, but the thing is, that the standard doesn't really require an implementation to document them. The standard does say implementations [*should*](https://timsong-cpp.github.io/cppwp/n4861/implimits#1.sentence-2) document this. But unfortunately, the entire \[implimits\] section is informative, not normative text. – StoryTeller - Unslander Monica Oct 07 '20 at 14:45
  • 1
    @StoryTeller-UnslanderMonica if I add a third `char arr3[INT_MAX]` it fails to compile with an error indicating the class is too large. This particular example, however, compiles and prints null, which is interesting. You apparently can't have an array larger than 0x7fffffff bytes, but you can have a class larger than that, which causes this unexpected behavior. Is it complaint behavior for a compiler to compile this code even though it can trivially detect that the class cannot be handled meaningfully. – Aykhan Hagverdili Oct 07 '20 at 14:54
  • 2
    @BillLynch The offsets are 0, 2147483647, 4294967294, 4294967295, respectively. This seems right. – Aykhan Hagverdili Oct 07 '20 at 14:57
  • 3
    The IntelliSense parser correctly identifies the problem, the compiler does not. A bit tricky to do since this issue can only be detected in the back-end. It is an x64 code generator limitation, objects cannot be larger than 2GB. Beyond that a very different way to generate the address needs to be used, LEA can't work anymore due to the displacement overflow. Not available. Use Help > Send Feedback > Report a Problem – Hans Passant Oct 07 '20 at 15:17
  • @HansPassant IntelliSense identifies the problem in x86 configuration, but not in x64 configuration. – Aykhan Hagverdili Oct 07 '20 at 15:21
  • 2
    Another fun result: in x64 mode, `&Foo::arr3 == &Foo::arr1`. – ecatmur Oct 07 '20 at 15:29
  • @AyxanHaqverdili the bad thing about MS compiler that behind scenes it's still limited to 32bit in some parts of code. for long time it was only 32bit even while producing 64bit code. But gcc shows similar behavior. And -1 results in nullptr while actually ANY memory location with negative offset should be nullptr according to standard (an address "before" beginning of struct) – Swift - Friday Pie Oct 08 '20 at 16:09
  • 2
    @Swift-FridayPie: The standard says absolutely nothing about representation of pointer-to-member. – Ben Voigt Oct 08 '20 at 16:12

4 Answers4

8

The size limit of an object is implementation defined, per Annex B of the standard [1]. Your struct is of an absurd size.

If the struct is:

struct Foo
{
    char arr1[INT_MAX];
    //char arr2[INT_MAX];
    char ch1;
    char ch2;
};

... the size of your struct in a relatively recent version of 64-bit MSVC appears to be around 2147483649 bytes. If you then add in arr2, suddenly sizeof will tell you that Foo is of size 1.

The C++ standard (Annex B) states that the compiler must document limitations, which MSVC does [2]. It states that it follows the recommended limit. Annex B, Section 2.17 provides a recommended limit of 262144(?) for the size of an object. While it's clear that MSVC can handle more than that, it documents that it follows that minimum recommendation so I'd assume you should take care when your object size is more than that.

[1] http://eel.is/c++draft/implimits

[2] https://learn.microsoft.com/en-us/cpp/cpp/compiler-limits?view=vs-2019

computerquip
  • 123
  • 7
  • That's about size of an object, and we don't have an object of type `Foo` here. – Aykhan Hagverdili Oct 08 '20 at 06:59
  • 1
    @AyxanHaqverdili: Strictly speaking, MSVC has a limit on the size of types, which indeed can be slightly different from a limit on the size of objects (especially in regard to arrays). Appendix B allows both limits, but suggests only a value for objects. – MSalters Oct 08 '20 at 10:31
  • So the answer is that I am causing undefined behavior, or was the compiler supposed to reject this code? – Aykhan Hagverdili Oct 08 '20 at 18:29
  • It's undefined behavior what happens when you go past the said limit, as in the standard doesn't define what happens when you go past that limit nor does sizeof define what should happen in the example above. sizeof result cannot be 0 however, hence the value of 1. From a usability standpoint, it's sort of ridiculous the compiler doesn't say *something* about the limitation being reached though. – computerquip Oct 09 '20 at 00:25
  • @AyxanHaqverdili While there isn't an explicit object of type Foo here, when you do thing like "sizeof", it's often dealing with the object representation of a given type or in respect to an object of a type. For example, a pointer to non-static class member is described as something "which identify members of a given type within objects of a given class". http://eel.is/c++draft/basic#compound-1.8 – computerquip Oct 09 '20 at 01:30
3

It's clearly a collision between an optimization on pointer-to-member representation (use only 4 bytes of storage when no virtual bases are present) and the pigeonhole principle.

For a type X containing N subobjects of type char, there are N+1 possible valid pointer-to-members of type char X::*... one for each subobject, and one for null-pointer-to-member.

This works when there are at least N+1 distinct values in the pointer-to-member representation, which for a 4-byte representation implies that N+1 <= 232 and therefore the maximum object size is 232 - 1.

Unfortunately the compiler in question made the maximum object-type size (before it rejects the program) equal to 232 which is one too large and creates a pigeonhole problem -- at least one pair of pointer-to-members must be indistinguishable. It's not necessary that the null pointer-to-member be one half of this pair, but as you've observed in this implementation it is.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • Good point. It appears that I can keep adding as many members as I wish so long as I keep the 'two INT_MAX arrays + two char' pattern, and it'll mark the last one as null. https://godbolt.org/z/q99786 – Aykhan Hagverdili Oct 08 '20 at 18:19
  • 1
    @AyxanHaqverdili: Oh my, that still doesn't trigger a "class too large" error? Looks like that particular diagnostic is entirely broken. – Ben Voigt Oct 08 '20 at 18:40
2

The expression &Foo::ch2 is of type char Foo::*, which is pointer to member of class Foo. By rules, a pointer to member converted to bool should be evaluated as false ONLY if it is a null pointer, i.e. it had nullptr assigned to it.

The fault here appears to be a implementation's flaw. i.e. on gcc compilers with -march=x86-64 any assigned pointer to member evaluates to non-null (1) unless it had nullptr assigned to it with following code:

struct foo
{
    char arr1[LLONG_MAX];
    char arr2[LLONG_MAX];
    char ch1;
    char ch2;
};

int main()
{
    char  foo::* p1 = &foo::ch1;
    char  foo::* p2 = &foo::ch2;
    std::cout << (p1?"Not null ":"null ") << '\n';
    std::cout << (p2?"Not null ":"null ") << '\n';
    
    std::cout << LLONG_MAX + LLONG_MAX << '\n';
    std::cout << ULLONG_MAX << '\n';
    std::cout << offsetof(foo, ch1) << '\n';
}

Output:

Not null 
null 
-2
18446744073709551615
18446744073709551614

Likely it's related to fact that class size is exceeding platform limitations, leading to offset of member being wrapped around of 0 (internal value of nullptr). Compiler doesn't detect it because it becomes a victim of... integer overflow with signed value and it's programmer's fault to cause UB within compiler by using signed literals as array size: LLONG_MAX + LLONG_MAX = -2 would be "size" of two arrays combined.

Essentially size of first two members is calculated as negative and offset of ch1 is -2 represented as unsigned 18446744073709551614. And -2 therefore pointer is not null. Another compiler may clamp value to 0 producing a nullptr, or actually detect existing problem as clang does.

If offset of ch1 is -2, then offset of ch2 is -1? Let's add this:

std::cout << reinterpret_cast<signed long long&&> (offsetof(foo, ch1)) << '\n';
std::cout << reinterpret_cast<signed long long&&> (offsetof(foo, ch2)) << '\n';

Additional output:

-2
-1

And offset for first member is obviously 0 and if pointer represent offsets, then it needs another value to represent nullptr. it's logical to assume that this particular compiler considers only -1 to be a null value, which may or may not be case for other implementations.

Swift - Friday Pie
  • 12,777
  • 2
  • 19
  • 42
  • To your point that _"on gcc compilers any assigned pointer to member evaluates to non-null"_: gcc and clang both return null when compiled with -m32: https://godbolt.org/z/znn5En – Bill Lynch Oct 08 '20 at 15:03
  • @BillLynch that's changing limitations.. but I was about to add to that, just had to look into cde and experiment – Swift - Friday Pie Oct 08 '20 at 15:07
  • 1
    And note that `offsetof(struct Foo, ch2) == 0xffffffff`, not `0`. – Bill Lynch Oct 08 '20 at 15:17
  • @BillLynch Can't reproduce your results with either -m32 or -march=x86-64 and LLONG_MAX, maybe you have some uncommon build. Attempted 4.8 and 10\11. 0xffffffff is -1 which is non-nullptr, it result of wrap around. gcc with -m32 returns non-null, you probably use some inconvenient g++ build or different runtime library (it may affect it too). clang would detect problem correctly – Swift - Friday Pie Oct 08 '20 at 15:28
  • To make your code similar to the OPs, you should be doing `char foo::*p = &foo::ch2`, not `ch1`. When fixed, this returns `"null"` as well. https://godbolt.org/z/Gc3eEo – Bill Lynch Oct 08 '20 at 15:41
  • Whoops. Bad godbolt link. Trying again: https://godbolt.org/z/6bExnM – Bill Lynch Oct 08 '20 at 15:46
  • ah.. that explains.. gcc uses -1 as magic number. That's implementation – Swift - Friday Pie Oct 08 '20 at 15:49
  • @Swift-FridayPie [MSVC uses -1 as the null value for pointer to member](https://stackoverflow.com/a/2761414/995714) – phuclv Oct 08 '20 at 16:08
  • @Swift-FridayPie no, it's just the 0xFFFFFFFF bit pattern that defines the null pointer. It's neither an offset nor a real address. The standard allows NULL to be any bit pattern and quite a lot of platforms use 0xFFFFFFFF for null like [AMD GCN](https://stackoverflow.com/q/41102947/995714). See also http://c-faq.com/null/machexamp.html – phuclv Oct 08 '20 at 16:15
  • it appears that MSVC also uses -1 as null-pointer-to-member. – Aykhan Hagverdili Oct 08 '20 at 17:59
  • I am not causing any signed overflow, since I am not doing any signed arithmetic in here. What the compiler does internally isn't my code. Pointers are not integers, they're an abstraction, which seems to be leaking here. – Aykhan Hagverdili Oct 08 '20 at 18:00
0

When I test the code, VS shows that Foo: the class is too large. enter image description here

When I add char arr3[INT_MAX], Visual Studio will report Error C2089 'Foo': 'struct' too large. Microsoft Docs explains it as The specified structure or union exceeds the 4GB limit. enter image description here

Barrnet Chou
  • 1,738
  • 1
  • 4
  • 7
  • In my test, that error happens only when I add a third array with INT_MAX elements. Other comments also verify the behavior I observed with 2 arrays. Maybe you're not doing a clean build? I am not sure. – Aykhan Hagverdili Oct 01 '20 at 07:16
  • I have cleaned and rebuild my compiler, which may have something to do with everyone's compiler settings. What I mean is that it may be because the elements in the struct exceed the limit of the compiler for `struct`. – Barrnet Chou Oct 01 '20 at 08:15
  • And interestingly, when you comment out `char ch1;` or change `INT_MAX` to `INT_MAX-1`, the result of the program is `NOT NULL`. – Barrnet Chou Oct 01 '20 at 08:18
  • 3
    that's because the class is carefully written to cleanly wrap around the pointer to -1. When you change stuff, that doesn't happen. – Aykhan Hagverdili Oct 01 '20 at 08:39
  • Please [use text, not images/links, for text--including tables & ERDs](https://meta.stackoverflow.com/q/285551/3404097). Use images only for what cannot be expressed as text or to augment text. Include a legend/key & explanation with an image. – philipxy Oct 06 '20 at 18:33
  • The first warning doesn't happen in x64 configuration. – Aykhan Hagverdili Oct 07 '20 at 15:24
  • In my opinion, This is related to the memory allocation of struct. – Barrnet Chou Oct 09 '20 at 01:14