1

I'm trying t understand a problem we cleared recently when using Clang 5.0 and Undefined Behavior Sanitizer (UBsan). We have code that processes a buffer in the forward or backwards direction. The reduced case is similar to the code shown below.

The 0-len may look a little unusual, but it is needed for early Microsoft .Net compilers. Clang 5.0 and UBsan produced integer overflow findings:

adv-simd.h:1138:26: runtime error: addition of unsigned offset to 0x000003f78cf0 overflowed to 0x000003f78ce0
adv-simd.h:1140:26: runtime error: addition of unsigned offset to 0x000003f78ce0 overflowed to 0x000003f78cd0
adv-simd.h:1142:26: runtime error: addition of unsigned offset to 0x000003f78cd0 overflowed to 0x000003f78cc0
...

Lines 1138, 1140, 1142 (and friends) are the increment, which may stride backwards due to the 0-len.

ptr += inc;

According to Pointer comparisons in C. Are they signed or unsigned? (which also discusses C++), pointers are neither signed nor unsigned. Our offsets were unsigned and we relied on unsigned integer wrap to achieve the reverse stride.

The code was fine under GCC UBsan and Clang 4 and earlier UBsan. We eventually cleared it for Clang 5.0 with help with the LLVM devs. Instead of size_t we needed to use ptrdiff_t.

My question is, where was the integer overflow/undefined behavior in the construction? How did ptr + <unsigned> result in signed integer overflow and lead to undefined behavior?


Here is an MSVC that mirrors the real code.

#include <cstddef>
#include <cstdint>
using namespace std;

uint8_t buffer[64];

int main(int argc, char* argv[])
{
    uint8_t * ptr = buffer;
    size_t len = sizeof(buffer);
    size_t inc = 16;

    // This sets up processing the buffer in reverse.
    //   A flag controls it in the real code.
    if (argc%2 == 1)
    {
        ptr += len - inc;
        inc = 0-inc;
    }

    while (len > 16)
    {
        // process blocks
        ptr += inc;
        len -= 16;
    }

    return 0;
}
jww
  • 97,681
  • 90
  • 411
  • 885
  • 2
    Pointers themselves are neither signed or unsinged. But pointer addition is signed, since the pointer can be incremented or decremented, when a pointer is pointing to somewhere inside an array. Nothing further can be said based on the information in this question, as it is insufficient to determine whether or not there's undefined behavior, unless a [mcve] is provided. – Sam Varshavchik Dec 18 '17 at 00:05
  • 1
    If you have a pointer to a 3rd element of an array, you can add -1 to it, and end up with a pointer to the 2nd element in the array. Pointer addition is signed. – Sam Varshavchik Dec 18 '17 at 00:28
  • I recall a few months ago (or maybe last year) a live bug in something important for the same problem as in this MCVE – M.M Dec 18 '17 at 03:12

2 Answers2

2

The definition of adding an integer to a pointer is (N4659 expr.add/4):

expr.add/4

I used an image here in order to preserve the formatting (this will be discussed below).

Note that this is a new wording which replaces a less-clear description from previous standards.

In your code (when argc is odd) we end up with code equivalent to:

uint8_t buffer[64];
uint8_t *ptr = buffer + 48;
ptr = ptr + (SIZE_MAX - 15);

For the variables in the standard quote applied to your code, i is 48 and j is (SIZE_MAX - 15) and n is 64.

The question now is whether it is true that 0 ≤ i + j ≤ n. If we interpret "i + j" to mean the result of the expression i + j, then that equals 32 which is less than n. But if it means the mathematical result then it is much greater than n.

The standard uses a font for mathematical equations here and does not use the font for source code. is not a valid operator either. So I think they intend this equation to describe the mathematical value i.e. this is undefined behaviour.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • For a rationale: imagine you were on a system with 16-bit `size_t` but 32-bit pointers. It would not be possible for a compiler to replace `ptr += 65540` with `ptr -= 16` since the first one might be a valid addition within a maximum-sized object – M.M Dec 18 '17 at 03:34
  • On systems where `size_t` and `ptrdiff_t` have different sizes, that would be true. Segmented-mode 8086 was interesting because objects could be over 32K but pointer subtraction generally yielded a 16-bit signed value. If two pointers identified portions of an object 60000 bytes apart, the difference would be -5536, but adding -5536 to one pointer would yield the other. – supercat Dec 18 '17 at 21:36
  • @M.M. - Sorry to dig up an old question. I am not sure this is equivalent to the original program: `ptr = ptr + (SIZE_MAX - 15);`. `inc = 0-inc;` was its own statement and used integer wrap. Wrap is well defined, and the expression completed before it was used in the `ptr = ptr + ...`. `ptr + ` is still within the original array. – jww Jul 12 '18 at 14:09
1

The C standard defines type ptrdiff_t as being the type yielded by the pointer-difference operator. It would be possible for a system to have a 32-bit size_t and a 64-bit ptrdiff_t; such definitions would be a natural fit for a system which used 64-bit linear or quasi-linear pointers but did required individual objects to be less than 4GiB each.

If objects are known to be less than 2GiB each, storing values of type ptrdiff_t rather than size_t might make the program needlessly inefficient. In such a scenario, however, code should not use size_t to hold pointer differences that might be negative, but instead use int32_t [which will be large enough if objects are less than 2GiB each]. Even if ptrdiff_t is 64 bits, a value of type int32_t will be properly sign-extended before it is added or subtracted from any pointers.

supercat
  • 77,689
  • 9
  • 166
  • 211