18

First time questioner :) Is it possible to transform global c-style arrays to std::arrays without breaking the code? I'm working on a project which consists of decompiling the source code of an old game. We have already managed to refactor a large part of the disassembly/decompilation output. Since it's automatic there are still sections like

  int a;
  int b[50];
  *(&a + 100) = xxx;

or

  int b[50];
  int a;
  *(&a - 100) = xxx;

and other types of crazy pointer arithmetics remaining, which have yet to be refactored manually. But we would like to use bounds checking for sections that have been (presumably) correctly changed to arrays.

(Ignore the text in italics, I'm keeping it just for consistency in the comments)I've found one problem so far with chaning every array: sizeof(class containing array) would change. This could break code in some cycles, for example someclass somearray[100]; //for example (sizeof(somearray[0]) == 50) is true int pointer = (int)somearray; pointer += 100 ((someclass)pointer)->doSomething(); .because pointer +=100 wouldn't be pointing to the second element, but somewhere inside the first, or even zeroth, I'm not sure (don't forget it's automatically decompiled code, hence the ugliness).

I'm thinking of changing every global array to std::array and every instance of accessing the array without the [] operator to array._Elems.

Are there any problems that might arise if I were to change global arrays to std::arrays in code such as this?

Edit You were right about the size not changing. I had an error in the testing functions. So I'll expand the question:

Is it safe to change every c-style array to std::array?

Edit Our current code is actually only runnable in debug mode, since it doesn't move variables around. Release mode crashes basically at the start of the program.

Edit Since there seems to be some confusion what this question is about, let me clarify: Is there some guarantee that there's no other member in the array, other than T elems [N] ? Can I count on having

array<array<int,10>, 10> varname;
int* ptr = &varname[0][0];
ptr += 10

and be sure that ptr is pointing at varname[1][0] regardless of implementation details? Although it's guaranteed that an array is contiguous, I'm not sure about this. The standard contains an implementation, but I'm not sure whether that's an example implementation or the actual definition which every implementation should adhere with iterator and const_iterator being the only things that are implementation specific, since only those have the words implementation-defined (I don't have the latest specifiation at hand, so there might be some other differences).

Silvester
  • 421
  • 5
  • 13
  • 7
    `void *pointer = somearray; pointer += 100` isn't even valid C++ code...you cannot do pointer arithmetic on a void pointer, because nobody knows what size its elements are. You'd need to cast to char* for example if you want to move the pointer by 100 bytes. – John Zwinck Jun 06 '13 at 13:16
  • 1
    `int v[100]` and `std::array v` are of the same size (all the magic is done at compile time). I'm not sure I understand the question... – 6502 Jun 06 '13 at 13:17
  • `sizeof(Class)` should not change. – juanchopanza Jun 06 '13 at 13:19
  • 1
    @JohnZwinck: IIRC gcc had extension that treated void* as char* for pointer math purposes and it makes sense in cases you have the void* as a generic memory pointer and fiddle with bytes – Balog Pal Jun 06 '13 at 13:19
  • @BalogPal: despite that GCC doesn't give nearly enough warnings by default for my taste, it does give a warning about this: `t.cpp:4:13: warning: pointer of type 'void *' used in arithmetic [-Wpointer-arith] `. I've got g++ 4.7.3. – John Zwinck Jun 06 '13 at 13:24
  • Sorry, my bad with the void stuff. Changed the code to show what it really looks like – Silvester Jun 06 '13 at 13:27
  • 2
    @6502 I'd believe that, but why do you believe that? Is there a guarantee that a standard layout `struct` is the same size as its sole data member? – Yakk - Adam Nevraumont Jun 06 '13 at 13:27
  • The problem you describe is odd: if the compiled code does pointer arithmetic, it should use the element size each time, and that use should appear in the decompiled code, at least as a raw constant (unless the element size is `1` of course). – didierc Jun 06 '13 at 13:45
  • Out of plain curiosity: which version and what game do you work on? :) – quetzalcoatl Jun 06 '13 at 13:48
  • in the case of `int` arrays, the element being of size `4` (on 32 bit platforms), it certainly has been optimized into the final offset, which means that in the first case the pointer is pointing at the 25th element. – didierc Jun 06 '13 at 13:49
  • So in your hypothetical example, the code should include the class instance size somewhere. – didierc Jun 06 '13 at 13:51
  • @quetzalcoatl Might & Magic 7 – Silvester Jun 06 '13 at 13:51
  • It should be safe to use `std::array` in place of raw arrays _for legal uses of arrays_. Unfortunately the code you show contains illegal uses that produce undefined behavior. – bames53 Jun 06 '13 at 15:44
  • @Yakk: I believe it because there's no need, given what `std::array` is required to provide, to make it bigger. About what an implementation can **theoretically** do while remaining conforming things are of course different and even `helloworld.cpp` could explode, theoretically, because of **stack overflow** ;-) – 6502 Jun 06 '13 at 17:43
  • Relevant: http://stackoverflow.com/q/19103244/103167 – Ben Voigt Oct 19 '14 at 17:22

3 Answers3

4

For one-dimensional arrays, this might work in all cases, the 2D case is more tricky:

In principle, it is possible for the std::array < > template to only consist of the array itself because its length argument is a compile time variable which does not need to be stored. However, your STL-implementation might have chosen to store it anyway, or any other data it needs. So, while '&a[n] == &a[0] + n' holds for any std::array, the expression '&a[n][0] == &a[0][0] + n*arrayWidth' might not hold for a 'std::array < std::array, arrayHeight >'.

Still you might want to check whether 'sizeof(std::array < int, 100 >) == sizeof(int) * 100' with your STL-implementation. If it does, it should be safe to replace even the 2D arrays.

cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
  • After a few days of testing everything seems to work fine so far. I've read somewhere that some implementations force specific alignments on the array structure. One more thing to look out for. – Silvester Jul 01 '13 at 20:30
  • See also http://stackoverflow.com/questions/24317220/range-based-for-on-multi-dimensional-array – Ben Voigt Oct 19 '14 at 17:20
2

I wonder how that replacement should even work in code full of pointer arithmetic.

/// @file array_eval.cpp
#include <iostream>
#include <array>
#include <algorithm>


int main() {
    auto dump = [](const int& n){std::cout << n << " ";};

#ifdef DO_FAIL
    std::array<int, 10> arr;
#else    
    int arr[10];
#endif

    // this does not work for std::arrays
    int* p = arr; 

    std::for_each(p, p+10, dump);
    std::cout << std::endl;
    return 0;
}

And

g++ -Wall -pedantic -std=c++11 -DDO_FAIL array_eval.cpp 

of course fails:

array_eval.cpp: In function ‘int main()’:
array_eval.cpp:17:14: error: cannot convert ‘std::array<int, 10ul>’ to ‘int*’ in initialization
     int* p = arr; 
              ^
Solkar
  • 1,228
  • 12
  • 22
  • That's why I mentioned the part "[changing] every instance of accessing the array without the [] operator to array._Elems", i.e. changing it to int* p = arr._Elems (or possibly &arr[0]) would work. Of course this means tons of MANUAL changes in the code, which is why I want to make sure there's no chance of breaking anything by changing the array type. – Silvester Jun 06 '13 at 15:25
  • "int* p = arr._Elems" It's arr.elems. And that actually IS a c-style T elems[N]: http://en.cppreference.com/w/cpp/header/array – Solkar Jun 06 '13 at 16:06
  • Strange that in MSVC it's really _Elems. Regardless, I don't really care if I have to expose the underlying C-style array to maintain the functionality of some parts of the code. I want to use std::array mainly for its bound checking mechanism during operator[] calls (bounds are checked in debug mode under MSVC by default). Code that relies on automatic decay to pointers will just have to be adjusted accordingly. – Silvester Jun 06 '13 at 17:26
  • You need arr.data() to obtain a pointer to the first element. – Ricky65 Dec 28 '13 at 14:31
  • 1
    @SilvesterSeredi the member data name of `std::array` are implementation details, do not use them. – Yakk - Adam Nevraumont Jul 07 '14 at 14:01
2

It depends on STL implementation. I means, the standard does not prevent to implement std::array using more members, or reserving more memory of that is really necessary (for example, for debugging), but I think is very improbable to found one std::array implementation without just use more of T elem[N]; data member.

If we assume the std::array implementation includes just one field for store the data and it allocate just the necessary memory (not more), int v[100]; and where the data is stored in array<int, 100> v; will have the same layout, since from the standard:

[array.overview 23.3.2.1 p1]:

The elements of an array are stored contiguously, meaning that if a is an array<T, N> then it obeys the identity &a[n] == &a[0] + n for all 0 <= n < N.

and [class.mem 9.2 p20]:

A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ]

Anyway, that depends on compiler and STL implementation. But the reversed code depends on compiler too. Why are you assuming int a; int b[50]; will locate a and then array of b in memory in that order and not in the other if that declarations are not part of a struct or class? The compiler would decide other thing for performance reasons (but I see that is improbable).

Gonmator
  • 760
  • 6
  • 15
  • As added in my last edit, we are forced to compile only with debug mode at the moment. Under release we run into exactly the problem you described, i.e. variables get shuffled around and the program doesn't even start. Regarding the rest of you answer: so if my used implementation only has _Ty _Elems[_Size] as its only member variable, the memory layout should be exactly the same as with ordinary arrays, right? – Silvester Jun 06 '13 at 17:51
  • If you used implementation only has `_Ty _Elems[_Size]`, yes, Memory layout should be exactly the same – Gonmator Jun 10 '13 at 14:58