-1

Convert a integer address to double pointer and the read it, but the size of integer is less than double type, and the read operation will read exceed the object size. I believe that this is undefined behavior but I didn't find the description in C standard, so I post this question to seek for an answer to confirm my point.

#include <stdio.h>
#include <stdint.h>

int main() {
    int32_t a = 12;
    double *p = (double*)(&a);
    printf("%lf\n", *p);
    return 0;
}
wt.cc
  • 313
  • 1
  • 9
  • 1
    Yes, it is undefined behavior (at least when `sizeof(double) > sizeof(int32_t)`, which is generally the case). What makes you think it might not be one? – Basile Starynkevitch May 16 '18 at 12:37
  • 2
    It would be *strict aliasing violation* even if type sizes would match. – user694733 May 16 '18 at 12:38
  • 1
    The concept "undefined" does not *necessarily* mean "we don't know what will happen". It means "we haven't **specified** what **should** happen". By design you cannot rely on a specific behavior because not only might the current observable effect be depending on specific circumstances, there is also no guarantee different compilers, versions, environments, etc. can impact this. – Lasse V. Karlsen May 16 '18 at 12:40
  • So I know exactly what the cpu is going to try to do, it's going to read "double" amount of bytes, interpret that as a double. What's in those bytes, whether it's allowed to read those bytes, whether those bytes can safely be converted to a double, etc. all of those are **undefined**. – Lasse V. Karlsen May 16 '18 at 12:41
  • Strict aliasing isn't really relevant if there are no writes happening, as there's no changing object state that needs to sync. – Alex Celeste May 16 '18 at 12:53
  • @Leushenko Dereferencing an inappropriate pointer, as in the code `printf("%lf\n", *p);` here, is enough to invoke UB. On hardware with strict alignment restrictions, such code is likely to cause a `SIGSEGV` or `SIGBUS` to be raised. – Andrew Henle May 16 '18 at 12:56
  • @Leushenko It's relevant as soon as the variable is accessed, read or write. For example code like `int i = 0; short* s = (short*)&i; while(*s != something) { i=something; }` may hang in an eternal loop even if the `short` representation of `something` is identical to the `int` representation. – Lundin May 16 '18 at 13:19

3 Answers3

3

It's undefined behavior per C11 6.5 ("the strict aliasing rule"):

6 The effective type of an object for an access to its stored value is the declared type of the object, if any.
...

In this case the effective type is int32_t (which is a typedef corresponding to something like int or long).

7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
...

double is not compatible with int32_t, so when the code access the data here: *p, it violates this rule and invokes UB.

See What is the strict aliasing rule? for details.

Lundin
  • 195,001
  • 40
  • 254
  • 396
1

From C99 Committee Draft 6.5 Expressions point 7:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:76)
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively,amember of a subaggregate or contained union), or
— a character type.

Object of type int has it's address accessed using lvalue expression of double type. int and double types are not compatible anyhow, they are not aggregate and double is not a character type. Dereferencing a pointer (an lvalue expression) of a type double that points to an object with type int is undefined behavior. Such operations are called strict aliasing violation.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • 2
    You are citing a wiki which seems to be citing the C++ standard (?), not the C standard. C and C++ happen to behave identical in this case, but the source used isn't ideal. – Lundin May 16 '18 at 12:56
  • You are right. I should pay more attention to cppreference next time. – KamilCuk May 16 '18 at 13:21
1

The Standard does not require that compilers behave predictably if an object of type "int" is accessed using an lvalue that has no visible relation to that type. In the rationale, however, the authors note that the classification of certain actions as Undefined Behavior is intended to allow the marketplace to decide what behaviors are considered necessary in quality implementations. In general, the act of converting a pointer to another type and then immediately performing an access with it falls in the category of actions which will be supported by quality compilers that are configured to be suitable for system programming, but may not be supported by compilers that act obtusely.

Even ignoring the lvalue-type issue, however, the Standard imposes no requirements as to what happens if an application tries to read from memory it does not own. Here again, the choice of behavior may sometimes be a quality-of-implementation issue. There are five main possibilities here:

  1. On some implementations, the contents of the storage might be predictable via means not described by the Standard, and the read would yield the contents of such storage.

  2. The act of reading might behave as though it yields bits with Unspecified values, but have no other side-effect.

  3. The attempted read may terminate the program.

  4. On platforms which use memory-mapped I/O, the out-of-bounds read could perform an unexpected operation with unknown consequences; this possibility is only applicable on certain platforms.

  5. Implementations that try to be "clever" in various ways may try to draw inferences based on the notion that the read cannot occur, thus resulting in side-effects that transcend laws of time and causality.

If you know that your code will be running on a platform where reads have no side-effects, the implementation won't try to be "clever", and your code is prepared for any pattern of bits the read might yield, then under those circumstances such a read may have useful behavior, but you would be limiting the situations where your code could be used.

Note that while implementations that define __STDC_ANALYZABLE__ are required to have most actions obey the laws of time and causality even in cases where the Standard would impose no other requirements, out-of-bounds reads are classified as Critical Undefined Behavior, and should thus be considered dangerous on any implementation that does not expressly specify otherwise.

Incidentally, there's another issue on some platforms which would apply even if e.g. code had used an int[3] rather than a single int: alignment. On some platforms, values of certain types may only be read or written to/from certain addresses, and some addresses which are suitable for smaller types may not be suitable for larger ones. On platforms where int requires 32-bit alignment but double requires 64-bit alignment, given int foo[3], a compiler might arbitrarily place foo so that (double*)foo would be a suitable address for storing a double, or so that (double*)(foo+1) would be a suitable place. A programmer who is familiar with the details of an implementation may be able to determine which address would be valid and exploit that, but code which blindly assumes that the address of foo will be valid may fail if double has a 64-bit alignment requirement.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • *In general, the act of converting a pointer to another type and then immediately performing an access with it falls in the category of actions which will be supported by quality compilers that are configured to be suitable for system programming, but may not be supported by compilers that act obtusely.* So any compiler on SPARC or ARM hardware is "obtuse" and not "quality"? – Andrew Henle May 16 '18 at 16:28
  • @AndrewHenle: Are you suggesting I should have mentioned alignment? That would be a fair issue I guess. My main point is that 6.5p7 as written cannot have been reasonably intended to be treated as describing all cases where compilers should behave predictably, since the way it's written doesn't even allow for cases like `struct s {int x;}={0} foo; foo.x = 1;` [that code clearly modifies the value in an object of type `struct s`, and does so using an lvalue of type `int`--not one of the types listed as suitable for such purpose] and recognition of derived pointers/lvalues is a QoI issue. – supercat May 16 '18 at 17:12
  • @AndrewHenle: As for whether I would regard clang and gcc as quality implementations when invoked with `-fstrict-aliasing`, I would not. The stated purpose of 6.5p7 was to allow compilers to assume seemingly-unrelated things don't alias--not to invite them to ignore obvious relationships among things. Both gcc and clang are prone to optimizing out code which reads storage as T1 and writes back the same bit pattern as T2, even if the storage had last been written as T1 and will next be read as T2. Having done that, they will then treat the preceding write and following read as unordered. – supercat May 16 '18 at 17:42
  • *Are you suggesting I should have mentioned alignment? That would be a fair issue I guess.* Of course. Way too many x86-only programmers don't even think of alignment. Other hardware isn't near as forgiving. *As for whether I would regard clang and gcc as quality implementations when invoked with `-fstrict-aliasing`, I would not.* On SPARC, I've seen (older versions by now of) GCC generate binaries from conforming code that would fail with `SIGBUS`, so yeah, I might have to agree with you there, with or without `-fstrict-aliasing`. – Andrew Henle May 16 '18 at 19:19
  • @AndrewHenle: Do you like what I wrote about alignment? – supercat May 16 '18 at 20:28