6

Say I have an array like so:

int val[10];

and I intentionally index it with everything from negative values to anything higher than 9, but WITHOUT using the resulting value in any way. This would be for performance reasons (perhaps it's more efficient to check the input index AFTER the array access has been made).

My questions are:

  1. Is it safe to do so, or will I run into some sort of memory protection barriers, risk corrupting memory or similar for certain indices?
  2. Is it perhaps not at all efficient if I access data out of range like this? (assuming the array has no built in range check).
  3. Would it be considered bad practice? (assuming a comment is written to indicate we're aware of using out of range indices).
DaedalusAlpha
  • 1,610
  • 4
  • 20
  • 33
  • Sorry, cannot quite follow you. Do you mean you access it like val[-2] and val[15] ? – RvdK Sep 13 '13 at 08:39
  • 1
    How do you perform an access without using the value? Assigning the value to a local variable or using it as a funciton argument *is* using it. – Angew is no longer proud of SO Sep 13 '13 at 08:42
  • 4
    A similiar question about the risks http://stackoverflow.com/questions/15646973/how-dangerous-is-it-to-access-an-array-out-of-bounds?rq=1 – Narkha Sep 13 '13 at 08:43
  • It most certainly crashes. And that's about the best thing that can happen if you don't like getting random behaviours. – firefrorefiddle Sep 13 '13 at 08:44
  • 1
    @Narkha, thanks didn't see that post when I searched for similar. – DaedalusAlpha Sep 13 '13 at 08:48
  • @Mike Hartl, I don't know, all times I've accessed an array out of bounds (by mistake) I just get some random junk, not a crash. – DaedalusAlpha Sep 13 '13 at 08:49
  • http://www.cplusplus.com/forum/articles/10/ – x29a Sep 13 '13 at 08:49
  • @DaedalusAlpha I've had one case where it crashed, and one case where the "random junk" caused wrong results, but only on one machine, with one data set (out of hundreds of machines and hundreds of data sets). – James Kanze Sep 13 '13 at 09:15

4 Answers4

17

It is undefined behavior. By definition, undefined means "anything could happen." Your code could crash, it could work perfectly, it could bring about peace and harmony amongst all humans. I wouldn't bet on the second or the last.

verbose
  • 7,827
  • 1
  • 25
  • 40
  • 6
    It could also spark WWIII, so I would avoid undefined behavior ;) – It'sPete Sep 13 '13 at 09:06
  • While at first I found it hard to imagine what could make a program crash just by reading past an array, a friend of mine pointed out that if the array for example is last in a page, you don't have to read many indices past it until you go over to the next page and crashes on a page fault. Just pointing this out as an illustration. – DaedalusAlpha Sep 13 '13 at 12:57
  • 1
    @It'sPete: It could also go back in time and prevent WWII ;-) – Cassio Neri Sep 13 '13 at 13:04
  • @DaedalusAlpha: On an Apple II computer with a floppy controller in slot 6, reading 0xC0EF while a drive is spinning will turn on the drive's write circuitry, thus overwriting any portion of a disk that moves under the head (likely wiping out whatever track was accessed last). From the hardware's point of view, there's nothing "erroneous" about such an action., but it may contribute to the notion that UB may wipe out disks. – supercat May 24 '21 at 17:16
  • By "definition", when the standard uses the term Undefined Behavior, it means nothing more nor less than that the *Standard* imposes no requirements. The authors of the Standard explicitly recognized that in many cases where the Standard imposes no requirements, some implementations may usefully process code "in a documented fashion characteristic of the environment". Whether "anything" can happen would depend upon the environment has a characteristic behavior, and whether an implementation is designed to allow programmers to exploit such behaviors when defined by the environment. – supercat May 24 '21 at 17:26
10

It is Undefined Behavior, and you might actually run afoul of the optimizers.

Imagine this simple code example:

int select(int i) {
    int values[10] = { .... };

    int const result = values[i];

    if (i < 0 or i > 9) throw std::out_of_range("out!");

    return result;
}

And now look at it from an optimizer point of view:

  • int values[10] = { ... };: valid indexes are in [0, 9].

  • values[i]: i is an index, thus i is in [0, 9].

  • if (i < 0 or i > 9) throw std::out_of_range("out!");: i is in [0, 9], never taken

And thus the function rewritten by the optimizer:

int select(int i) {
    int values[10] = { ... };

    return values[i];
}

For more amusing stories about forward and backward propagation of assumptions based on the fact that the developer is not doing anything forbidden, see What every C programmer should know about Undefined Behavior: Part 2.

EDIT:

Possible work-around: if you know that you will access from -M to +N you can:

  • declare the array with appropriate buffer: int values[M + 10 + N]
  • offset any access: values[M + i]
Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
5

As verbose said, this yields undefined behavior. A bit more precision follows.

5.2.1/1 says

[...] The expression E1[E2] is identical (by definition) to *((E1)+(E2))

Hence, val[i] is equivalent to *((val)+i)). Since val is an array, the array-to-pointer conversion (4.2/1) occurs before the addition is performed. Therefore, val[i] is equivalent to *(ptr + i) where ptr is an int* set to &val[0].

Then, 5.7/2 explains what ptr + i points to. It also says (emphasis are mine):

[...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

In the case of ptr + i, ptr is the pointer operand and the result is ptr + i. According to the quote above, both should point to an element of the array or to one past the last element. That is, in the OP's case ptr + i is a well defined expression for all i = 0, ..., 10. Finally, *(ptr + i) is well defined for 0 <= i < 10 but not for i = 10.

Edit:

I'm puzzled to whether val[10] (or, equivalently, *(ptr + 10)) yields undefined behavior or not (I'm considering C++ not C). In some circumstances this is true (e.g. int x = val[10]; is undefined behavior) but in others this is not so clear. For instance,

int* p = &val[10];

As we have seen, this is equivalent to int* p = &*(ptr + 10); which could be undefined behavior (because it dereferences a pointer to one past the last element of val) or the same as int* p = ptr + 10; which is well defined.

I found these two references which show how fuzzy this question is:

May I take the address of the one-past-the-end element of an array?

Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?

Community
  • 1
  • 1
Cassio Neri
  • 19,583
  • 7
  • 46
  • 68
  • If I remember correctly, the `&val[10]` might be one of those cases where C and C++ differ in behavior. `val + 10` is valid in both (1 past the end), but `val[10]` is allowed in C as long as you do not use the value (itself) whereas it is undefined in C++ just to access the value. – Matthieu M. Sep 13 '13 at 12:10
  • Well `6.5.2.1` paragraph *1* says `[...]and the result has type ‘‘type’’.` would seem to disallow `val[10]`. – Shafik Yaghmour Sep 13 '13 at 12:11
  • @MatthieuM. That's my understanding as well. However I failed to find th explanation in the C++11 standard. Do you know which clause should I look at? – Cassio Neri Sep 13 '13 at 12:56
  • @ShafikYaghmour: You might be correct as far as C is concerned (your quote is from the C standard). My question is whether this is allowed in C++ (sorry I wasn't clear on that). I failed to find a definitive answer in the C++ standard. One of the two references provided above points to a C++ DR that shows the question has been discussed by the committee but AFAIK a definitive answer is still missing :-( – Cassio Neri Sep 13 '13 at 13:00
  • @CassioNeri Hmmm, well C++ draft N3485 in section `5.2.1` paragraph *1* says `[...]The result is an lvalue of type “T.”[...]`. That seems to also imply that accessing out of bounds at least violates the spirit of the standard even if it not explicitly labeled undefined ... maybe I am reading wrong though. – Shafik Yaghmour Sep 13 '13 at 13:24
  • @ShafikYaghmour: I think 5.2.1 is, indeed, very relevant for this question. However, I don't know if this is enough. For instance look at [this](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232), especially the comment by Tom Plum. This is too much of standardese for me :-( – Cassio Neri Sep 13 '13 at 13:42
  • @CassioNeri: well, reading 5.2.1 you get that `E1[E2]` is by definition equivalent to `*((E1)+(E2))`, and refer to 5.3.1 [expr.unary.op]. I know the problem is that `*` should not be applied to the 1-after-the-end, but I could not find the wording :( – Matthieu M. Sep 13 '13 at 14:43
  • The expression E1[E2] may be defined as meaning `*(E1+E2)`, but there are some cases where the Standard would simultaneously say that one is undefined but imply that the latter should be defined. For example, given `char arr[2][8]`, gcc may process `arr[0][i]` in ways that would fail if `i` exceeds 7, but seems to process `*(arr[0]+i)` in a manner that allows the entire 2d array to be indexed via `i`. – supercat May 24 '21 at 16:16
3

If you put it in a structure with some padding ints, it should be safe (since the pointer actually points to "known" destinations). But it's better to avoid it.

struct SafeOutOfBoundsAccess
{ 
    int paddingBefore[6];
    int val[10];
    int paddingAfter[6];
};

void foo()
{
    SafeOutOfBoundsAccess a;
    bool maybeTrue1 = a.val[-1] == a.paddingBefore[5];
    bool maybeTrue2 = a.val[10] == a.paddingAfter[0];
}
Onur
  • 5,017
  • 5
  • 38
  • 54
  • That's a pretty neat idea, if you know how much you might over-run or under-run the index :) – DaedalusAlpha Sep 13 '13 at 08:57
  • 4
    It's technically still UB, but I believe on implementations other than Hell++ it will work. – Angew is no longer proud of SO Sep 13 '13 at 08:59
  • 3
    @Angew Actually, there have been (and maybe still are) implementations where it would fail. They are sold for development purposes, in order to detect buffer overruns. (And of course, if he later replaces `int val[10]` with `std::vector`, it will fail on a lot of implementations, including VC++ and g++ when the usual debug options are used.) – James Kanze Sep 13 '13 at 09:10
  • @Angew: I believe you're 100% right. Please, see my answer correct me if I'm wrong. – Cassio Neri Sep 13 '13 at 09:19