2

I am writing an application that does the computation in C++ then returns the multi dimensional results as a numpy array using pybind11. From the documentation of pybind and from the examples seen online, the numpy array creation is basically passing the pointer of the array of data and enclosing details on the strides. In the C++ part however, I am not keen on using a one dimensional array and using some fancy indexing, but I would rather use structs. That got me thinking whether (homogenous) variables placed in continous memory could be treated as part of an array.

My train of thought was the following. The elements of an array are placed in continous memory. The elements of a struct are also placed continously in the order of their declaration (when padding is not involved). So the following four variable declarations are the same from the point of memory placement, e.g. if I were to point a pointer to the first element then I could iterate through all the elements by taking one integer worth of steps at a time:

struct struct_array
{
    int elem[4] = {};
};

struct struct_ints
{
    int a = {};
    int b = {};
    int c = {};
    int d = {};
};

// integer matrix of shape 3x4
int one_dim_array[3 * 4] = {};
int two_dim_array[3][4] = {};
struct_array array_of_struct_arrays[3] = {};
struct_ints array_of_struct_ints[3] = {};

Here is my test code that suggest that the answer is yes to my question. It does some address printing, setting and reading elements.

#include <iostream>

struct struct_array
{
    int elem[4] = {};
};

struct struct_ints
{
    int a = {};
    int b = {};
    int c = {};
    int d = {};
};

int main(void)
{
    const int rows = 3;
    const int cols = 4;

    int one_dim_array[rows * cols] = {};
    int two_dim_array[rows][cols] = {};
    struct_array array_of_struct_arrays[rows] = {};
    struct_ints array_of_struct_ints[rows] = {};

    std::cout << sizeof(int) << " is the size of an int in bytes\n";

    std::cout << "\nOne dim array\n";
    for (int i = 0; i < 12; ++i)
    {
        one_dim_array[i] = i;
        std::cout << &one_dim_array[i] << "\n";
    }

    std::cout << "\nTwo dim array\n";
    for (int i = 0; i < rows; ++i)
    {
        for (int j = 0; j < cols; ++j)
        {
            two_dim_array[i][j] = i * cols + j;
            std::cout << &two_dim_array[i][j] << "\n";
        }
    }

    std::cout << "\nArray of struct arrays\n";
    for (int i = 0; i < rows; ++i)
    {
        for (int j = 0; j < cols; ++j)
        {
            array_of_struct_arrays[i].elem[j] = i * cols + j;
            std::cout << &array_of_struct_arrays[i] << " " << &array_of_struct_arrays[i].elem[j] << "\n";
        }
    }

    std::cout << "\nArray of struct ints\n";
    for (int i = 0; i < rows; ++i)
    {
        array_of_struct_ints[i].a = i * cols + 0;
        array_of_struct_ints[i].b = i * cols + 1;
        array_of_struct_ints[i].c = i * cols + 2;
        array_of_struct_ints[i].d = i * cols + 3;

        std::cout << &array_of_struct_ints[i] << " " << &array_of_struct_ints[i].a << "\n";
        std::cout << &array_of_struct_ints[i] << " " << &array_of_struct_ints[i].b << "\n";
        std::cout << &array_of_struct_ints[i] << " " << &array_of_struct_ints[i].c << "\n";
        std::cout << &array_of_struct_ints[i] << " " << &array_of_struct_ints[i].d << "\n";
    }

    for (int i = 0; i < 4; ++i)
    {
        // Maybe using a reinterpret_cast would be more modern
        void *void_p = nullptr;
        switch (i)
        {
        case 0:
            void_p = &one_dim_array;
            std::cout << "\nOne dim array\n";
            break;

        case 1:
            void_p = &two_dim_array;
            std::cout << "\nTwo dim array\n";
            break;

        case 2:
            void_p = &array_of_struct_arrays;
            std::cout << "\nArray of struct arrays\n";
            break;

        case 3:
            void_p = &array_of_struct_ints;
            std::cout << "\nArray of struct ints\n";
        }
        int *int_p = (int *)void_p;
        for (int i = 0; i < 12; ++i)
        {
            std::cout << *(int_p + i) << "\n";
        }
    }

    std::cout << "Hello world!";
    return 0;
}

Is this right or am I missing something? What are you thoughts on this matter? (Apart from that I should switch to std::array.) Thank you for your time!

François Andrieux
  • 28,148
  • 6
  • 56
  • 87
Dudly01
  • 444
  • 3
  • 13
  • 1
    In practice, and pragmatically, you are right (for most C++ implementations on x86-64). But do take time to read the [C++11 standard](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf). You'll find out that in principle, you are wrong. – Basile Starynkevitch Sep 22 '20 at 20:46
  • @BasileStarynkevitch: That surprises me. I have come across a "safe" compiler (when programming under a very thick NDA) that clamped pointer arithmetic to the bounds of the array - allowing one past the end of course, which meant it wasn't as safe as the compiler dudes thought it was. (And it was a C compiler anyway.) – Bathsheba Sep 22 '20 at 20:47
  • So consider using [Frama-C](http://frama-c.com/); it might have a C++ variant – Basile Starynkevitch Sep 22 '20 at 20:48
  • 2
    *Here is my test code that suggest that the answer is yes to my question* -- You'll quickly learn that running test code and seeing "good results" doesn't mean that the code is ok. For example, `char *p = new int[10];` and then `delete p;` may work, but it still is undefined behavior to use `delete` instead of `delete []`. C++ (and C) are the few languages where you need to break out the standard document (linked to in a previous comment) if you decide to do something in an unorthodox fashion, so as to verify what you are doing is actually legal. – PaulMcKenzie Sep 22 '20 at 20:53
  • 1
    @PaulMcKenzie That is exactly the reason I asked the question here than just relying on my code alone. – Dudly01 Sep 22 '20 at 20:55
  • @BasileStarynkevitch So basically if pybind expects me to give a pointer to an `array`, of integers, then I should suck it up and use a regular "flat" `array`, no multidim `array` and no `array` of `struct`s? – Dudly01 Sep 22 '20 at 21:32

1 Answers1

7

You don't know that your variables are placed contiguously in memory. Your source code might look like that but that's as far as it goes.

If you want your variables to behave like an array, then use an array.

The language provides no defined way of reaching a variable from another one using pointer arithmetic unless both are elements are within the same array.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • 1
    It is spelled out here, for `operator+` : https://eel.is/c++draft/expr.add#4 – François Andrieux Sep 22 '20 at 20:46
  • But according to [cppreference](https://en.cppreference.com/w/cpp/language/array) the elements of an array are allocated contigously. So is that mean that the layout of the `struct` is that is not contigous? What is the case if the the struct only has an `array` in it? – Dudly01 Sep 22 '20 at 21:02
  • 1
    @Dudly01 The only guarantee for structs is that `&s.b > &s.a` (where `s` is a `struct_ints`). That is, members that are written afterwards (with the same access) will always be after previous members in memory. There could be padding. But even if there isn't padding, it is UB to do pointer arithmetic like `*(&s.a + 1) == s.b`, because that just isn't allowed – Artyer Sep 22 '20 at 21:10