20

Why doesn't C++ support range based for loop over dynamic arrays? That is, something like this:

int* array = new int[len];
for[] (int i : array) {};

I just invented the for[] statement to rhyme with new[] and delete[]. As far as I understand, the runtime has the size of the array available (otherwise delete[] could not work) so in theory, range based for loop could also be made to work. What is the reason that it's not made to work?

Balint Laczay
  • 244
  • 2
  • 6
  • 12
    The root of your question really is why C++ doesn't allow you to retrieve the size of a dynamically allocated array. For that, see here: https://stackoverflow.com/questions/36471101/c-doesnt-tell-you-the-size-of-a-dynamic-array-but-why (read the highest voted answer, not the accepted one) – Benjamin Lindley Nov 18 '17 at 07:35
  • 4
    You should change the name `array` for `pointer` and everything will be clearer. – juanchopanza Nov 18 '17 at 08:13
  • 6
    If you want to conveniently iterate over something like an array, you could use one of the c++ std library's containers. [`std::array`](http://en.cppreference.com/w/cpp/container/array) and [`std::vector`](http://en.cppreference.com/w/cpp/container/vector) comes to mind. See [my answer to another question](https://stackoverflow.com/a/44985585/4486783) as an example. – plats1 Nov 18 '17 at 08:57
  • 2
    I hope this is a language-lawyer question. In real code, you'd just use `std::vector` and be done with it... – Christian Hackl Nov 18 '17 at 09:31
  • Closely related: `dynarray`-s. See https://stackoverflow.com/q/17353873/841108 – Basile Starynkevitch Nov 18 '17 at 11:26
  • 1
    Possible duplicate of [Range-based for loop on a dynamic array?](https://stackoverflow.com/questions/15904896/range-based-for-loop-on-a-dynamic-array) – WorldSEnder Nov 18 '17 at 11:33
  • Possible duplicate of [C++ doesn't tell you the size of a dynamic array. But why?](https://stackoverflow.com/questions/36471101/c-doesnt-tell-you-the-size-of-a-dynamic-array-but-why) – underscore_d Nov 18 '17 at 14:02

6 Answers6

14

What is the reason that it's not made to work?

A range based loop like

 for(auto a : y) {
     // ...
 }

is just syntactic sugar for the following expression

 auto endit = std::end(y);
 for(auto it = std::begin(y); it != endit; ++it) {
     auto a = *it;
     // ...
 }

Since std::begin() and std::end() cannot be used with a plain pointer, this can't be applied with a pointer allocated with new[].

As far as I understand, the runtime has the size of the array available (otherwise delete[] could not work)

How delete[] keeps track of the memory block that was allocated with new[] (which isn't necessarily the same size as was specified by the user), is a completely different thing and the compiler most probably doesn't even know how exactly this is implemented.

user0042
  • 7,917
  • 3
  • 24
  • 39
  • @BalintLaczay The size is somehow theoretically known at runtime, at least it's the minimum that was specified with `new[]`. But it's usually the size of a bigger memory block. The only information that will be kept by the runtime is the pointer of the allocated memory block. This highly depends on the OS specific implementation of memory management. – user0042 Nov 18 '17 at 07:36
  • Thank you! Can I recommend that you expand the second part of your answer by: `delete[]` only needs to know the size if it has to make actual destructor calls. For an array of `int`s, the compiler might not even store the size anywhere, just rely on some underlying memory management to free the memory on delete[]. Thus, supporting the `for[]` suggested by the OP would potentially add some overhead by forcing the compiler to store the size even in situations where no destructor calls are need. Maybe also remove the first part of your answer, it does not contribute to answering my question – Balint Laczay Nov 18 '17 at 08:18
  • @BalintLaczay I'm not going to change my answer, it is fine as is. It's you who still seems to confuse arrays with pointers. There's no need that an _extra size info_ is stored for raw arrays, this is known at compile time. – user0042 Nov 18 '17 at 08:22
  • 1
    The runtime only tracks the size if the object has a destructor to call for `delete[]`. Otherwise it doesn't. `delete[]` doesn't get called for deleting a `new char[x]`; `delete` does. – Joshua Nov 18 '17 at 16:35
  • @BalintLaczay: You are fishing in the wrong waters: The array doesn't have to be allocated with `new[]` in the first place! It could be allocated with `malloc` or `alloca`, or even be returned from another `extern "C"` library (which might have its own runtime library!), so there is no way to know if there even is some size info, or where it is if it exists. – hoffmale Nov 18 '17 at 17:31
  • Minor point, your syntactic sugar is off: `std::end` is only called once in a range-based for loop. – Matthieu M. Nov 18 '17 at 17:54
  • @MatthieuM. Thanks for noticing. – user0042 Nov 18 '17 at 17:57
6

When you have this:

int* array = new int[len];

The problem here is that your variable called array is not an array at all. It is a pointer. That means it only contains the address of one object (in this case the first element of the array created using new).

For range based for to work the compiler needs two addresses, the beginning and the end of the array.

So the problem is the compiler does not have enough information to do this:

// array is only a pointer and does not have enough information
for(int i : array)
{
} 
Galik
  • 47,303
  • 4
  • 80
  • 117
5
int* array = new int[len];
for[] (int i : array) {}

There are several points which must be addressed; I'll tackle them one at a time.

Does the run-time knows the size of the array?

In certain conditions, it must. As you pointed out, a call to delete[] will call the destructor of each element (in reserve order) and therefore must know how many there are.

However, by not specifying that the number of elements must be known, and accessible, the C++ standard allows an implementation to omit it whenever the call to the destructor is not required (std::is_trivially_destructible<T>::value evaluates to true).

Can the run-time distinguish between pointer and array?

In general, no.

When you have a pointer, it could point to anything:

  • a single item, or an item in an array,
  • the first item in an array, or any other,
  • an array on the stack, or an array on the heap,
  • just an array, or an array part of a larger object.

This is the reason what delete[] exists, and using delete here would be incorrect. With delete[], you the user state: this pointer points to the first item of a heap-allocated array.

The implementation can then assume that, for example, in the 8 bytes preceding this first item it can find the size of the array. Without you guaranteeing this, those 8 bytes could be anything.

Then, why not go all the way and create for[] (int i : array)?

There are two reasons:

  1. As mentioned, today an implementation can elide the size on a number of elements; with this new for[] syntax, it would no longer be possible on a per-type basis.
  2. It's not worth it.

Let us be honest, new[] and delete[] are relics of an older time. They are incredibly awkward:

  • the number of elements has to be known in advance, and cannot be changed,
  • the elements must be default constructible, or otherwise C-ish,

and unsafe to use:

  • the number of elements is inaccessible to the user.

There is generally no reason to use new[] and delete[] in modern C++. Most of the times a std::vector should be preferred; in the few instances where the capacity is superfluous, a std::dynarray is still better (because it keeps track of the size).

Therefore, without a valid reason to keep using these statements, there is no motivation to include new semantic constructs specifically dedicated to handling them.

And should anyone be motivated enough to make such a proposal:

  • the inhibition of the current optimization, a violation of C++ philosophy of "You don't pay for what you don't use", would likely be held against them,
  • the inclusion of new syntax, when modern C++ proposals have gone to great lengths to avoid it as much as possible (to the point of having a library defined std::variant), would also likely be held against them.

I recommend that you simply use std::vector.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
3

This is not related to dynamic arrays, it is more general. Of course for dynamic arrays there exists somewhere the size to be able to call destructors (but remember that standard doesn't says anything about that, just that calling delete [] works as intended).

The problem is with pointers in general as given a pointer you can't tell if it correspond to any kind of...what?

Arrays decay to pointers but given a pointer what can you say?

Jean-Baptiste Yunès
  • 34,548
  • 4
  • 48
  • 69
1

array is not an array, but a pointer and there's no information about the size of the "array". So, compiler can not deduce begin and end of this array.

See the syntax of range based for loop:

{
   auto && __range = range_expression ; 
   for (auto __begin = begin_expr, __end = end_expr; 
   __begin != __end; ++__begin) { 
   range_declaration = *__begin; 
   loop_statement 
   } 
} 

range_expression - any expression that represents a suitable sequence (either an array or an object for which begin and end member functions or free functions are defined, see below) or a braced-init-list.

auto works at compile time.So, begin_expr and end_expr doesn't at deduct runtime.

msc
  • 33,420
  • 29
  • 119
  • 214
  • 1
    But why? Could `for[]` define `begin_expr` to be `range_expression` and `end_expr` to be `range_expression + size`, where `size` is the size of the array (as used in `delete[]`)? – Balint Laczay Nov 18 '17 at 07:24
1

The reason is that, given only the value of the pointer array, the compiler (and your code) has no information about what it points at. The only thing known is that array has a value which is the address of a single int.

It could point at the first element of a statically allocated array. It could point at an element in the middle of a dynamically allocated array. It could point at a member of a data structure. It could point at an element of an array that is within a data structure. The list goes on.

Your code will make ASSUMPTIONS about what the pointer points at. It may assume it is an array of 50 elements. Your code may access the value of len, and assume array points at the (first element of) an array of len elements. If your code gets it right, all works as intended. If your code gets it wrong (e.g. accessing the 50th element of an array with 5 elements) then the behaviour is simply undefined. It is undefined because the possibilities are endless - the book-keeping to keep track of what an arbitrary pointer ACTUALLY points at (beyond the information that there is an int at that address) would be enormous.

You're starting with the ASSUMPTION that array points at the result from new int[len]. But that information is not stored in the value of array itself, so the compiler has no way to work back to a value of len. That would be needed for your "range based" approach to work.

While, yes, given array = new int[len], the machinery invoked by delete [] array will work out that array has len elements, and release them. But delete [] array also has undefined behaviour if array results from something other than a new [] expression. Even

  int *array = new int;
  delete [] array;

gives undefined behaviour. The "runtime" is not required to work out, in this case, that array is actually the address of a single dynamically allocated int (and not an actual array). So it is not required to cope with that.

Peter
  • 35,646
  • 4
  • 32
  • 74