1

For the code snippet below,

int *p = (int *)malloc(10);
int *q = (int *)malloc(10);
ptrdiff_t x = q - p;
printf("Hello World %td", x);

I understand why a compiler might not throw an error for the pointer subtraction as the operation is undefined, not illegal.

What I do not understand however is the reason for not producing any warning for the situation. I've tried VS2017, gcc 7.1.1 and clang so far without any avail.

Stuff I already went through

  1. What exactly is a C pointer if not a memory address?.

  2. From, this answer and the cited para 6.5.6 of N1570 it is quite clear that simply subtracting two pointers does not produce any meaningful result.

For pointers to array members the result is the difference of the subscripts of the two array elements @ N1570 does makes sense, so I am not contesting the legality of the subtraction.

I'm quite sure the compiler understands that p and q are not pointers to array members and could easily warn an unsuspecting user about the problem. Why don't they ?

Even if they don't, is there a flag/compiler option for VS, gcc or clang that could catch this potential source of error ?

Neither does gcc -Wall -Wextra, nor, /Wall in VS detect this.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
R S
  • 162
  • 3
  • 11
  • 1
    Well... Who said `malloc` doesn't return pointers into the same array? – StoryTeller - Unslander Monica Oct 08 '18 at 15:20
  • The compiler doesn't know the semantics of what malloc returns - to it, malloc is just another function. –  Oct 08 '18 at 15:21
  • 2
    Sounds like you are looking for a static analyzer. That may catch this. As far as the compiler cares, you've written valid syntax. – NathanOliver Oct 08 '18 at 15:21
  • 1
    Compilers *could* detect this. They don't. It is a job of a static analyzer. – SergeyA Oct 08 '18 at 15:22
  • 1
    Such flow-analysis is really quite hard and complex. Maybe not for a simple case as the one show, but in the generic case for any two pointers it's not really realistic that a compiler should be able to do it. A statical analysis tool might be able to to it, but then that's part of its main work, unlike a compiler. – Some programmer dude Oct 08 '18 at 15:23
  • Yeah, for now a static analyzer looks like the way to go. Thanks Nathan, Sergey and 'Some programmer dude'! – R S Oct 08 '18 at 15:42
  • 1
    @NeilButterworth: Good compilers do know the semantics of `malloc`. For example, in `int b = 4; int main(void) { malloc(1); printf("%d\n", b); }`, Apple LLVM 9.1.0 (clang-902.0.39.2) removes the `malloc` call, as it knows it does not modify `b`. – Eric Postpischil Oct 08 '18 at 17:15
  • @Eric Did you read what I wrote? "The compiler doesn't know the semantics of what malloc returns" –  Oct 08 '18 at 17:58
  • @NeilButterworth: Why do you think compiler writers would not incorporate knowledge about `malloc` into compilers? – Eric Postpischil Oct 08 '18 at 18:01
  • @NeilButterworth: In `int b = 4; int main(void) { int *a = malloc(sizeof *a); *a = 3; printf("%d %d\n", *a, b); free(a); }`, LLVM produces code with no call to `malloc`. This implies it knows the return value of `malloc` does not point to `b`. Yes, the compiler does know some semantics about what `malloc` returns. – Eric Postpischil Oct 08 '18 at 18:06
  • @Eric Because in general they can't. malloc is a simply a function in the C Standard Library. It is perfectly OK for me to provide my own implementation of the C Standard Library that the compiler writers know nothing about. This is actually a not uncommon thing to do, with people providing library implementations with extra debugging information, taking advantage of specific hardware, etc. etc. –  Oct 08 '18 at 18:06
  • @NeilButterworth: `malloc` is not **simply** a function in the C standard library. It is a function with a particular specification. If you provide your own implementation, it must (if the result is to be a conforming C implementation) conform to the specification in the standard. Therefore, compiler writers may (for purposes of satisfying the C standard) assume that `malloc` behaves as specified in the C standard. – Eric Postpischil Oct 08 '18 at 18:09
  • @Eric Have you ever read the C Language Standard? The specification for malloc is incredibly loose. About the only thing it says is that you must be able to call free on a pointer it returns. How it may work (and there are many implementations of that around) is not specified in the Standard. And yes, it is "simply a function in the C standard library". –  Oct 08 '18 at 18:11
  • @NeilButterworth: The standard says `malloc` “allocates space for an object whose size is specified by size and whose value is indeterminate.” The fact that it *allocates* space means that each pointer returned from `malloc` that is not subsequently freed points to different space than another pointer returned and not freed. Therefore, in OP’s `int *p = (int *)malloc(10); int *q = (int *)malloc(10);`, the compiler may assume that `p` and `q` point to different objects and do not satisfy the requirements for `p-q` being defined. – Eric Postpischil Oct 08 '18 at 18:14
  • @Eric Did you read StoryTeller's first comment here? It's perfectly possible (and may possibly be the case on embedded systems) that p and q are pointers into a large, statically allocated array, and that p - q is perfectly valid, as they are pointers into the same array. –  Oct 08 '18 at 18:18
  • @NeilButterworth: StoryTeller is not a normative part of the C standard. Perhaps a compiler could treat memory provided by `malloc` as part of one large array (although that contrasts with issues of effective type in 6.5 6), that is merely a choice a compiler could possibly make and has no bearing on compilers that do not choose it. As I wrote, per the rules of C, a compiler may assume that `p` and `q` point to different objects. Furthermore, my second example above proves that LLVM does assume something about the value returned by `malloc`. – Eric Postpischil Oct 08 '18 at 18:25
  • 1
    @NeilButterworth: Here is another: LLVM compiles `int *a = malloc(sizeof *a); int *b = malloc(sizeof *b); *a = 3; *b = 4; printf("%d %d\n", *a, *b); free(a); free(b);` into code with no `malloc`. If it knew nothing about `malloc`, it would have to assume the value returned by `malloc` might point to the same place for both calls, in which case the `printf` would print “4 4”. But LLVM generates code that prints “3 4”. This shows LLVM knows that these two `malloc` calls return pointers to different locations. – Eric Postpischil Oct 08 '18 at 18:27

1 Answers1

4

Checking it reliably by a static analyzer is unlikely to be useful for any but the most trivial cases. Now, that doesn't mean GCC doesn't offer you the option of verifying your program is correct in general. It's just that such checks require instrumentation at run-time and are somewhat costly! Therefore they aren't enabled be default.

If you visit the GCC documentation, the section about "Program Instrumentation Options" has these two useful options documented:

-fsanitize=address

Enable AddressSanitizer, a fast memory error detector. Memory access instructions are instrumented to detect out-of-bounds and use-after-free bugs. The option enables -fsanitize-address-use-after-scope. See https://github.com/google/sanitizers/wiki/AddressSanitizer for more details. The run-time behavior can be influenced using the ASAN_OPTIONS environment variable. When set to help=1, the available options are shown at startup of the instrumented program. See https://github.com/google/sanitizers/wiki/AddressSanitizerFlags#run-time-flags for a list of supported options. The option cannot be combined with -fsanitize=thread.

-fsanitize=pointer-subtract

Instrument subtraction with pointer operands. The option must be combined with either -fsanitize=kernel-address or -fsanitize=address The option cannot be combined with -fsanitize=thread. Note: By default the check is disabled at run time. To enable it, add detect_invalid_pointer_pairs=2 to the environment variable ASAN_OPTIONS. Using detect_invalid_pointer_pairs=1 detects invalid operation only when both pointers are non-null.

Now, your example is trivial, so if we follow the guide and instrument it accordingly:

export ASAN_OPTIONS=${ASAN_OPTIONS}:"detect_invalid_pointer_pairs=1"
gcc --std=c99 -Wall -pedantic -fsanitize=address -fsanitize=pointer-subtract -O0 main.c
./a.out

The output we get from running our program is very telling of the undefined behavior in your code:

=================================================================
==24090==ERROR: AddressSanitizer: invalid-pointer-pair: 0x602000000030 0x602000000010
    #0 0x4008bb in main (/tmp/1539013071.1731968/a.out+0x4008bb)
    #1 0x7f8cfe10182f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #2 0x4007d8 in _start (/tmp/1539013071.1731968/a.out+0x4007d8)

0x602000000030 is located 0 bytes inside of 10-byte region [0x602000000030,0x60200000003a)
allocated by thread T0 here:
    #0 0x7f8cfe5992b0 in __interceptor_malloc ../../.././libsanitizer/asan/asan_malloc_linux.cc:86
    #1 0x4008a4 in main (/tmp/1539013071.1731968/a.out+0x4008a4)
    #2 0x7f8cfe10182f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

0x602000000010 is located 0 bytes inside of 10-byte region [0x602000000010,0x60200000001a)
allocated by thread T0 here:
    #0 0x7f8cfe5992b0 in __interceptor_malloc ../../.././libsanitizer/asan/asan_malloc_linux.cc:86
    #1 0x400896 in main (/tmp/1539013071.1731968/a.out+0x400896)
    #2 0x7f8cfe10182f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

SUMMARY: AddressSanitizer: invalid-pointer-pair (/tmp/1539013071.1731968/a.out+0x4008bb) in main
==24090==ABORTING
StoryTeller - Unslander Monica
  • 165,132
  • 21
  • 377
  • 458