102

I was told that the following code has undefined behavior until C++20:

int *p = (int*)malloc(sizeof(int));
*p = 10;

Is that true?

The argument was that the lifetime of the int object is not started before assigning the value to it (P0593R6). To fix the problem, placement new should be used:

int *p = (int*)malloc(sizeof(int));
new (p) int;
*p = 10;

Do we really have to call a default constructor that is trivial to start the lifetime of the object?

At the same time, the code does not have undefined behavior in pure C. But, what if I allocate an int in C code and use it in C++ code?

// C source code:
int *alloc_int(void)
{
    int *p = (int*)malloc(sizeof(int));
    *p = 10;
    return p;
}

// C++ source code:
extern "C" int *alloc_int(void);

auto p = alloc_int();
*p = 20;

Is it still undefined behavior?

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
anton_rh
  • 8,226
  • 7
  • 45
  • 73

2 Answers2

71

Is it true?

Yes. Technically speaking, no part of:

int *p = (int*)malloc(sizeof(int));

actually creates an object of type int, so dereferencing p is UB since there is no actual int there.

Do we really have to call default constructor that is trivial to start the life time of the object?

Do you have to per the C++ object model to avoid undefined behavior pre-C++20? Yes. Will any compiler actually cause harm by you not doing this? Not that I'm aware of.

[...] Is it still undefined behavior?

Yes. Pre-C++20, you still didn't actually create an int object anywhere so this is UB.

Barry
  • 286,269
  • 29
  • 621
  • 977
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/219773/discussion-on-answer-by-barry-is-using-malloc-for-int-undefined-behavior-until-c). – Makyen Aug 13 '20 at 23:15
  • 1
    Why isn't the language in https://timsong-cpp.github.io/cppwp/n3337/basic.life#1.1 sufficient for it not to be UB? After all, storage of proper size and alignment was obtained for `int` in the example -- the lifetime of the `int` object begins there. – avakar Oct 26 '20 at 10:40
  • @avakar because `[intro.object]` is an exhaustive listing of how objects are created, and "storage was allocated" isn't one of them (until C++20) – Caleth Apr 05 '23 at 16:27
  • @Caleth: That inconsistency means there was a defect in the Standard. It did not make `[basic.life]` non-authoritative. (Furthermore that "exhaustive list" in `[intro.object]` said "An object is created by a definition, by a *new-expression* or **by the implementation when needed**" which is sufficiently catch-all to include the needs of `[basic.life]`) – Ben Voigt Apr 05 '23 at 19:13
  • @Caleth: Which of the exhaustive list in `[intro.object]` created the array of `unsigned char` which is referred to in https://eel.is/c++draft/basic.types#general-4 ? Clearly `[intro.object]` is not and never has been exhaustive. – Ben Voigt Apr 05 '23 at 19:22
  • @BenVoigt Except we're talking about pre-C++20, where the wording did not have the "by the implementation when needed" part - it *was* an exhaustive list: https://timsong-cpp.github.io/cppwp/n4659/intro.object#1 – Barry Apr 05 '23 at 19:30
  • @Barry: I copied that sentence from C++03, which absolutely is pre-C++20. The English grammar of the sentence you quoted does not claim exclusivity -- the linking verb "is" is being used in passive voice, not to exhaustively define "creation of an object". – Ben Voigt Apr 05 '23 at 19:53
  • @BenVoigt the wording "by the implementation ([`[class.temporary]`](https://timsong-cpp.github.io/cppwp/n3337/class.temporary)) when needed" is rvalues of class type, which changes in C++17 to "when a temporary object is created ([conv.rval], [class.temporary])" – Caleth Apr 06 '23 at 08:11
  • @BenVoigt and the thing that creates the `unsigned char[]` is the C++ memory model. An object, during it's lifetime, *is* the storage location that it lives in, and a storage location is a sequence of bytes, and [basic.types] permits you to observe those bytes – Caleth Apr 06 '23 at 08:30
  • @Caleth: "the thing that creates the unsigned char[] is the C++ memory model" Well yes, it does. But `[basic.types]` calls them **objects**, and your allegedly exhaustive list of object creation doesn't list them, so it obviously is not exhaustive. – Ben Voigt Apr 06 '23 at 14:46
  • @BenVoigt "An object occupies a region of storage in its period of construction ([class.cdtor]), throughout its lifetime, and in its period of destruction ([class.cdtor])." The region of storage can be `reinterpret_cast`ed as an `unsigned char[]`, that's one of the ways multiple objects can have the same address. – Caleth Apr 06 '23 at 15:11
  • @BenVoigt If you were correct, why would the committee vote in [a paper that claimed otherwise](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0593r6.html#idiomatic-c-code-as-c)? – Caleth Apr 06 '23 at 15:16
  • @Caleth: A committee vote is on whether to accept the changes recommended in the paper. It does not mean that the entire paper is free of mistakes. – Ben Voigt Apr 06 '23 at 15:29
  • @BenVoigt did you read the paper? Is there any part of that paper that *doesn't* hinge on you being wrong? – Caleth Apr 06 '23 at 15:30
  • @Caleth: Although I believe that `[basic.life]` already provided for implicit object creation, I agree that it is valuable to update `[intro.object]` to make mention of it. So if I were on the committee, I would vote for the changes despite the paper making wrong claims. – Ben Voigt Apr 06 '23 at 15:31
45

Yes, it was UB. The list of ways an int can exist was enumerated, and none applies there, unless you hold that malloc is acausal.

It was widely considered a flaw in the standard, but one of low importance, because the optimizations done by C++ compilers around that particular bit of UB didn't cause problems with that use case.

As for the 2nd question, C++ does not mandate how C++ and C interact. So all interaction with C is ... UB, aka behaviour undefined by the C++ standard.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • 2
    Ad 2nd question: Well, at least the C code in the C standard library can be called with defined results: The C++ standard defines that interaction perfectly well. That bodes well for general interoperability even if that had not been a core design paradigm. – Peter - Reinstate Monica Aug 13 '20 at 00:13
  • 5
    Can you expand on the enumerated list of ways for an int to exist? I remember asking a similar question about the lifespan of primitive types, and being told that a primitive could "exist" simply by saying it exists because the spec didn't say otherwise. It sounds like I may have missed out on a useful section of the spec! I'd love to know what section I should have perused! – Cort Ammon Aug 13 '20 at 01:48
  • @CortAmmon: Maybe you misremember the fact that just writing a primitive at a given address is sufficient for it to start existing? – Matthieu M. Aug 13 '20 at 09:58
  • 8
    @CortAmmon The enumerated list of ways for an object (of any type) to exist in C++20 are in [\[intro.object\]](https://eel.is/c++draft/basic#intro.object-1.sentence-2): (1) by definition (2) by new-expression (3) implicitly per the new rules in P0593 (4) changing the active member of a union (5) temporary. (3) is new in C++20, (4) was new in C++17. – Barry Aug 13 '20 at 13:35
  • 3
    Is C/C++ interaction really UB? It would make more sense to be implementation-defined, rather than undefined, otherwise it'd be strange to even have the `extern "C"` syntax at all. – Ruslan Aug 13 '20 at 15:46
  • 4
    @Ruslan: Implementations are free to define any behaviour ISO C++ leaves undefined. (For example `gcc -fno-strict-aliasing`, or MSVC by default). Saying "implementation defined" would *require* all C++ implementations to define some way in which they interoperate with some C implementation, so it makes sense to leave fully up to implementation whether they want to do anything like that or not. – Peter Cordes Aug 13 '20 at 18:58
  • 4
    @PeterCordes: I wonder why so many people fail to recognize that distinction between IDB and UB, and adopt some fanciful notion that the Standard's failure to mandate that all implementations process a construct meaningfully implies a judgment that no implementations should be expected to do so, and implementations which don't do so must not as a consequence be viewed as inferior. – supercat Aug 13 '20 at 20:18
  • @supercat I guess the problem here is that there're two kinds of UB: the good-reason UB like dereferencing the null pointer and the bad-reason UB like a threat of nasal demons for using core language syntax. – Ruslan Aug 13 '20 at 20:27
  • 2
    @Ruslan The problem here is that you think that there are two kinds of UB. Well, there sort of are: there is UB that your compiler defines behavior for, and there is UB it does not. – Yakk - Adam Nevraumont Aug 13 '20 at 21:26
  • @Yakk-AdamNevraumont: Into which category should place situations where part of the Standard and a compiler's documentation would together describe the behavior of some construct, but another part of the Standard says it's Undefined Behavior, and the compiler's documentation says nothing about which part should be given priority? IMHO, quality compilers should expressly document any situations where "undefinedness" would have priority, and the lack of such documentation should imply that the compiler would behave as specified without regard for whether the Standard requires it. – supercat Aug 13 '20 at 21:56
  • @super (a) press "ask question" for questions, (b) ask concrete questions, you are describing a complex hypothetical with no evidence it exists or if it does exist that you included the right nuance. The standard attempts to be clear (ish, it is standardese) when something would otherwise be defined behaviour is undefined in my experience. – Yakk - Adam Nevraumont Aug 21 '20 at 03:39
  • @Yakk-AdamNevraumont: You said you recognize two types of UB, and so I was asking into which of your categories you would place the form I described. – supercat Aug 21 '20 at 05:15
  • @supercat (a) press "ask question" for questions, (b) ask concrete questions, you are describing a complex hypothetical with no evidence it exists or if it does exist that you included the right nuance. The standard attempts to be clear (ish, it is standardese) when something would otherwise be defined behaviour is undefined in my experience. – Yakk - Adam Nevraumont Aug 21 '20 at 14:03
  • @Yakk-AdamNevraumont: The "complex hypothetical" is in fact an extremely common situation, and the source of most contention regarding the Standard. For example, if a compiler specifies that `uint32_t` values are stored as four octets little-endian, and `uint16_t ` are stored as two octets little-endian, then in the absence of N1570 6.5p7 or equivalent that specification would fully describe the behavior of code that converts the address of a `uint32_t` to a `uint16_t` and accesses it. So would the compiler documentation of storage formats count as "documenting the behavior"? – supercat Aug 21 '20 at 16:21