10

I have the following example

#include <stdlib.h>
#include <stdio.h>
#include <stddef.h>

typedef struct test{
    int a;
    long b;
    int c;
} test;

int main()
{
    test *t = (test*) malloc(offsetof(test, c));
    t -> b = 100;
}

It works fine, but Im not sure about it. I think I have UB here. We have a pointer to an object of a structure type. But the object of the structure type is not really valid.

I went through the standard and could not find any definition of this behavior. The only section I could find close to this one is 6.5.3.2:

If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined

But this is not really relevant since the pointer returned by malloc is completely valid.

Is there a reference in the standard explaining such a behavior? I'm using C11 N1570.

St.Antario
  • 26,175
  • 41
  • 130
  • 318
  • `` is not standard C. – melpomene Dec 23 '18 at 14:57
  • What is the purpose of something like that? Why do you need to do this? What's the use-case? What is the *real* problem you want to solve ([related reading about the XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem))? – Some programmer dude Dec 23 '18 at 14:58
  • @melpomene Fixed, thanks. – St.Antario Dec 23 '18 at 14:58
  • And can't you solve whatever problem you have by having *two* structures? – Some programmer dude Dec 23 '18 at 14:59
  • 2
    @Someprogrammerdude The purpose is to find a formal definition of the behavior in the standard. I thought that tag language-lawayer suggestes that. – St.Antario Dec 23 '18 at 15:00
  • So plain curiosity then? It's not related to another question here on SO? Or some other code you found? I'm just asking to establish some context. And plain curiosity is as good reason as any other to ask something like this. – Some programmer dude Dec 23 '18 at 15:06
  • offsetof returns the offset of field b in the structure test. If b would be at the first position in the structure it would return 0. You then allocate this offset number bytes. I guess that's not intended. – Stephan Schlecht Dec 23 '18 at 15:10
  • @Someprogrammerdude Just curious, yes. The question arose when I was learning about structure members in the Standard (Section 6.5.2.3) – St.Antario Dec 23 '18 at 15:11
  • _it works fine_, really ? if you set `t->c` you go after the allocated memory. You do NOT allocate a _test_ – bruno Dec 23 '18 at 15:13
  • 2
    @StephanSchlecht Why of the field b? I thought `offsetof(test, c)` returns the actual number of bytes (including padding) in the `struct test` layout before `c` – St.Antario Dec 23 '18 at 15:15
  • @bruno Yes. I mentioned that the object pointed to by t is not a valid test. But I did not use c. – St.Antario Dec 23 '18 at 15:15
  • oops, looking again at the code you are right, my fault – Stephan Schlecht Dec 23 '18 at 15:16
  • 1
    @bruno *You do NOT allocate a `test`* Exactly. There is no `struct test` being allocated. This is clearly UB. – Andrew Henle Dec 23 '18 at 15:16
  • What is the _real_ / _final_ goal of doing that ? That code is probably just a proposal for a given requirement, what is that requirement ? – bruno Dec 23 '18 at 15:20
  • 5
    @AndrewHenle - To be fair you never allocate a type, you allocate storage. In this case, the OP allocated not enough storage, but they never access it out of bounds ostensibly. So it's a decent language-lawyer question. – StoryTeller - Unslander Monica Dec 23 '18 at 15:27
  • @bruno: If later fields would hold meaningful information for structure instances but not others (one or more bits in earlier members may indicate whether the later members are used), adjusting the allocation size based upon which fields are used could save a lot of storage. Historically, a common approach would have been to cast pointers to longer and shorter structure types, exploiting the Common Initial Sequence guarantees, but gcc's interpretation of N1570 doesn't allow for that an approach. – supercat Jan 01 '19 at 20:25
  • Related, showing that gcc treats this as UB: https://stackoverflow.com/questions/46522451/why-is-gcc-allowed-to-speculatively-load-from-a-struct – Nate Eldredge Feb 21 '21 at 07:00

2 Answers2

7

From C2011, paragraph 6.2.6.1/4:

Values stored in non-bit-field objects of any other object type consist of n x CHAR_BIT bits, where n is the size of an object of that type, in bytes.

Therefore, since the allocated object in your code is smaller than the size of a struct test, it cannot contain a value of an object of that type.

Now consider your expression t -> b = 100. C2011, paragraph 6.5.2.3/4 defines the behavior of the -> operator:

A postfix expression followed by the -> operator and an identifier designates a member of a structure or union object. The value is that of the named member of the object to which the first expression points [...].

(Emphasis added.) We've established that your t does not (indeed, cannot) point to a struct test, however, so the best we can say about 6.5.2.3/4 is that it does not apply to your case. There being no other definition of the behavior of the -> operator, we are left with paragraph 4/2 (emphasis added):

If a ''shall'' or ''shall not'' requirement that appears outside of a constraint or runtime- constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ''undefined behavior'' or by the omission of any explicit definition of behavior.

So there you are. The behavior of your code is undefined.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 1
    The only issue I take with bringing `->` is that the pointed to memory doesn't have a type yet, according to the effective type rule. So If we apply this logic, even when there is enough memory allocated, there isn't a `struct test` object there. – StoryTeller - Unslander Monica Dec 23 '18 at 15:39
  • ... Which means all non-trivial programs have UB? I don't think it's the conclusion we want to reach. – StoryTeller - Unslander Monica Dec 23 '18 at 15:41
  • @StoryTeller, [paragraph 6.5/6](https://port70.net/~nsz/c/c11/n1570.html#6.5p6) plays in that space when the allocated object is in fact large enough. Basically, the (effective) type of the allocated object is then established by the access to it, but that cannot work in the OP's case. – John Bollinger Dec 23 '18 at 15:43
  • This feels flimsy even when there is enough storage. If there was no access with an lvalue of `struct test` then there is no object there, and therefore 6.5.2.3/4 is on shaky grounds regardless. Maybe it's a defect, IDK. – StoryTeller - Unslander Monica Dec 23 '18 at 15:51
  • @StoryTeller, if you're inclined to reject the premise that the `->` operation, or at least its combination with `=`, constitutes an access to `*t`, so as to (attempt to) establish the effective type of that object pursuant to 6.5/6, then indeed you will perceive a much larger problem (which does not invalidate my conclusion). But I think there's a clearer and stronger issue around 6.5.2.3/4. – John Bollinger Dec 23 '18 at 16:07
  • I'd argue that rejecting or accepting the premise of `->` is in fact crucial to your argument (and may in fact invalidate it). With that premise in-hand your argument stands easily. Without it, there must be other wording (I hope) that makes `t->m` valid even though the complete storage is not of effective type `test`. If such wording exists, then it must also apply (or not, due to other reasons) to the OP's code with incomplete storage. Anyway, that's my 2c. – StoryTeller - Unslander Monica Dec 23 '18 at 16:14
  • Makes sense to me. Thanks. – St.Antario Dec 23 '18 at 17:02
  • @JohnBollinger: If one accepts the premise that the whole purpose of N1570 6.5p7 and its descendants was to say what things may *alias* (which is what the footnote says), and that it wasn't intended to be applied in cases that don't involve aliasing, then the need for the "effective type" concept will go away. If a pointer gets cast to a `FOO*`, and every lvalue that will ever be used to access the storage will be derived from that `FOO*`, that pointer isn't going to alias anything that has accessed the storage previously (because nothing that accessed it previously will do so again), so... – supercat Dec 24 '18 at 16:54
  • ...nothing should need to know or care when the "type" of the storage becomes `FOO`. Only if lvalues are used in a fashion that would alias should types matter for objects whose storage is owned by the application (all types in C, or standard-layout types in C++). – supercat Dec 24 '18 at 16:56
-2

since the pointer returned by malloc is completely valid.

No the pointer is not "completely valid". Not at all.

Why do you think the pointer is "completely valid"? You didn't allocate enough bytes to hold an entire struct test - the pointer is not "completely valid" as there isn't a valid struct test object for you to access.

There's no such thing as a partial object in C. That's why you can't find it in the C standard.

It works fine

No, it doesn't.

"I didn't observe it blowing up." is not the same as "It works fine."

Your code doesn't do anything observable. Per the as-if rule the compiler is free to elide the entire thing and just return zero from main().

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • The problem is putting `printf("t -> b = %ld\n", t -> b);` also works... Thats why I was not sure if it's okay... – St.Antario Dec 23 '18 at 15:19
  • @St.Antario visibly you have a very specific definition of the word _work_ (independently of the fact you wanted to say `printf("t -> b = %ld\n", t->b);` ) – bruno Dec 23 '18 at 15:23
  • @bruno Thanks, fixed. I meant that the program does not crash and prints what is expected. – St.Antario Dec 23 '18 at 15:25
  • 2
    @St.Antario this is not because you do not see a crash in your case that your program is correct ... and it is not correct. in the same way I can survive Russian roulette, but that does not mean that game is not insane – bruno Dec 23 '18 at 15:27
  • So the UB triggered when I casted `void*` to `test*`? – St.Antario Dec 23 '18 at 15:27
  • 1
    Yes, not enough bytes were allocated. But no, nothing was accessed out of bounds on the face of it. So where is the standardese to justify the threat of nasal demons? – StoryTeller - Unslander Monica Dec 23 '18 at 15:29
  • 2
    @St.Antario "It dodn't crash" doesn't mean it worked OK. Simple programs often can "get away" with misusing `malloc()`'d memory because they don't do enough to corrupt memory bad enough to crash the program. Also, because of the way `malloc()` works, there's often a bit of "extra" data at the end of each allocated object, so an overwrite of a few bytes often goes unnoticed. – Andrew Henle Dec 23 '18 at 15:29
  • 2
    @StoryTeller *So where is the standardese to justify the threat of nasal demons?* You have it backwards. Show in the standard where this will **not** produce nasal demons. If you can't find in the standard where it's defined behavior, it's not defined behavior. – Andrew Henle Dec 23 '18 at 15:30
  • It is correct that the burden of proof lies on the side of proving validity, not *in*validity. But the fact is that if `malloc` succeeds then it produces a valid object pointer, and that C unconditionally permits that pointer to be converted to type `struct test *`. That having been done, the resulting value may certainly be assigned to a variable of its own (pointer) type. The problem is not in any of that; it is in the subsequent *use* of that pointer. – John Bollinger Dec 23 '18 at 17:31