C/C++ allocation of arrays of array like objects

Question

I am mostly a C programmer, and I am looking for a fast and elegant solution to do what I want in C++. Let us consider this simple data structure

struct mystruct
{
    int * array1;
    int * array2;
    size_t size;
};

The two pointers array1 and array2 are to be thought as two arrays of length size. I need a huge amount of these (about 2**30 or 1.000.000.000) all of the same small size (about 100). All of them will be deallocated at the exact same time. I can do the following in C with only one call to malloc where K is the number of struct I need and N is the size of the arrays

EDITED VERSION (see the old one below)

size_t NN = N * sizeof(int);
struct mystruct * my_objects = malloc(K * sizeof(struct mystruct));
int * memory = malloc(2*K*NN);
for(i=0; i<K; ++i)
{
    my_objects[i].size = N;
    my_objects[i].array1 = memory + 2*i*NN;
    my_objects[i].array2 = memory + (2*i+1)*NN;
}
...
free(my_objects);
free(memory);

This version does not support very huge K and does not allow me to resize the array. But it is not so hard to design something for that purpose. Is there a way of creating a class in C++ that would be a kind of std::vector<mystruct> with forbidden shrinking and for which the allocation of array1 and array2 would not be based on dynamical allocation for each entry? I do want to minimize the effect of memory allocation since K is very big.

OLD VERSION:

size_t KK = K * sizeof(mystruct);
size_t NN = N * sizeof(int);
struct mystruct * my_objects = (struct mystruct *) malloc(KK + 2*K*NN);
for(i=0; i<K; ++i)
{
    my_objects[i].size = N;
    my_objects[i].array1 = (int *) (my_objects + KK + 2*i*NN);
    my_objects[i].array2 = (int *) (my_objects + KK + (2*i+1)*NN);
}

Since you mention `std::vector` you're obviously aware of it, so what is the problem that has stumped you? — Cheers and hth. - Alf, Dec 08 '15 at 15:15
I do not see how I can allocate the memory necessary for `array1` and `array2` using an `std::vector`... (I do want to minimize memory allocation calls). — V. Delecroix, Dec 08 '15 at 15:19
You could easily and trivially do it with two allocations. One for the structs, and another for the ints. I can see why you wouldn't want an allocation per object, but two allocations instead of one is not going to make a difference. — jalf, Dec 08 '15 at 15:26
I'm not an expert on this, but I wonder whether your proposed C code is actually legal, given the restrictions on pointer aliasing. — Nate Eldredge, Dec 08 '15 at 15:27
@V.Delecroix: You could possibly create such structures without any (dynamic) memory allocation at all... However, since you didn't really tell us what the *problem* is, it's somewhat difficult to whip up the proper showcase for `std::vector` or `std::array`... — DevSolar, Dec 08 '15 at 15:28
@V.Delecroix: No, you did **not** tell us whether you actually need arrays (vectors) of `N` *existing* `int`s, or just arrays (vectors) *capable* of taking `N` `int`s. You also did not tell us whether a purely stack-based solution, i.e. *without any memory allocation call at all*, would be acceptable. C and C++ are very different beasts and can do very different things. Just porting a C solution to C++ is usually not the best way to go. — DevSolar, Dec 08 '15 at 15:40
@DevSolar: I do not know in advance K and N. How can it be without dynamic memory allocation? And I would like to know whether there is a simple C++ solution (i.e. no malloc and using standard containers) that minimize memory allocation calls. — V. Delecroix, Dec 08 '15 at 15:42
you are allocating a big continuous memory. `KK + 2*K*NN`. is it OK? — alirakiyan, Dec 08 '15 at 15:44
@DevSolar: let me try to clarify. In the struct I want vector of fixed size `N` and I want `K` of them. But `N` and `K` are not known in advance, they gest known only in a given part of the code. Moreover, I might want to increase slightly `K` in the program since I am building a list of stuff and I basically want to count them. — V. Delecroix, Dec 08 '15 at 15:46
why don't you allocate memory like this? `array1 = new int[zzz];` ? — alirakiyan, Dec 08 '15 at 15:46
You don't need to multiply by `sizeof(int)` if your array is type `int *`. Pointer arithmetic will do that for you. — ams, Dec 08 '15 at 16:27
@V.Delecroix Could you go into the memory access patterns? Do you need random access here, for example, or will sequential access do? Are both `array1` and `array2` equally hot fields accessed together? — , Dec 17 '15 at 13:17

score 4 · Answer 1 · answered Dec 08 '15 at 15:51

4

Here's my literal translation from C to C++ that maintains the same memory layout:

std::unique_ptr<int[]> const memory(new int[2 * K * N]);

std::vector<mystruct> my_objects;
my_objects.reserve(K);

for (int i = 0; i < K; ++i)
{
    mystruct const tmp = {N, memory + 2*i*NN, memory + (2*i+1)*NN};
    my_objects.push_back(tmp);
}

answered Dec 08 '15 at 15:51

Simple

13,992
2
47
47

1

`auto memory = std::make_unique(2 * K * N);` is more trendy these days IMO. – Minor Threat Dec 08 '15 at 16:08
1

@MinorThreat That would value initialise all of the array elements, whereas `new int[2 * K * N]` doesn't. I tried to mimic the original code in this case. – Simple Dec 08 '15 at 16:10
Ideally this would be packaged into a `MyObjects` class that does the showcased setup in the constructor (which is given `K` and `N` as parameters) and manages the lifetime of the `unique_ptr<>`. Nice work. – DevSolar Dec 08 '15 at 16:11
Consider using emplace_back in the loop body, instead creating a local and using push_back: `my_objects.emplace_back({N, memory + 2*i*NN, memory + (2*i+1)*NN});`. Apparently copy elision cannot kick in using push_back: http://stackoverflow.com/questions/11875939/vector-push-back-rvalue-and-copy-elision. – Florian Kaufmann Dec 08 '15 at 16:30
1

@FlorianKaufmann His struct doesn't have an appropriate constructor, so `emplace_back` can't be used to initialise it in this way. – Simple Dec 08 '15 at 16:33

DevSolar · Answer 2 · 2015-12-08T16:26:59.390

1

Note: Solution created with minimal manual memory handling in mind, before OP edited in that his main requirement was performance due to a very large K. As std::vector still does behind-the-scenes memory allocations, this isn't a fast solution, just an elegant one.

Might be improved with a custom memory allocator, but I think @Simple's answer is better all-around, especially if encapsuled in a wrapper class.

struct MyStruct
{
    std::vector< int > array1;
    std::vector< int > array2;
    std::size_t size;

    MyStruct( std::size_t init_size ) :
        array1( std::vector< int >( init_size ) ),
        array2( std::vector< int >( init_size ) ),
        size( init_size )
    {}
};

// ...

std::vector< MyStruct > my_objects( K, N );

No dynamic memory allocation at all. (Well, not by you, anyway.)

edited Dec 08 '15 at 16:26

answered Dec 08 '15 at 15:46

DevSolar

67,862
21
134
209

1

Vectors perform dynamic memory allocation behind the scenes. This is a good solution (slightly more elegant than mine), but saying "no dynamic memory" is inaccurate. – Xirema Dec 08 '15 at 15:48
@Xirema: Ninja'd. ;-) – DevSolar Dec 08 '15 at 15:48
All right, some benchmark with K=16777216 and N=10: your version is 5.88sec and mine is 0.237sec. As I understand the thing, vector are doing dynamic allocation and the constructor is called for **each** element of the vector of MyStruct. – V. Delecroix Dec 08 '15 at 15:54
@V.Delecroix: Well, that's the first word you dropped about *performance*, and a requirement to allocate 160 MByte worth of memory this way. – DevSolar Dec 08 '15 at 15:59
@V.Delecroix: You could have *lead* with that information. Needing to handle 100GB of `int` arrays isn't your everyday use case. (And I wonder if your design might benefit from some improvement, I've *never* had to handle that much, and I've been working on some *real* data slingers.) – DevSolar Dec 08 '15 at 16:13

score 1 · Answer 3 · edited May 23 '17 at 11:59

1

The following does two memory allocations, one for each vector. Naturally you have to ensure that the ints vector lives longer than mystructs vector, since mystructs's members refer to ints's members.

  struct mystruct
  {
    int* array1;
    int* array2;
    std::size_t size;
  };

  std::vector<int> ints(N*2*K);
  std::vector<mystruct> mystructs(K);
  for (std::size_t i=0; i<K; i++) {
    mystruct& ms = mystructs[i];
    ms.array1 = &ints[2*N*i];
    ms.array2 = &ints[2*N*i+1];
    ms.size = N;
  }

Update: As tp1 pointed out, std::vector might reseat its internal array, invalidating all pointers into it. If you never add or remove elements, that is not an issue. If you do, consider using std::deque instead for ints. However then you also have more memory allocations upon construction, see What really is a deque in STL?. Note that sadly C++ does not allow a const std::vector of non-const elements, see Const vector of non-const objects.

edited May 23 '17 at 11:59

Community

1
1

answered Dec 08 '15 at 15:50

Florian Kaufmann

803
6
13

This solution is probably the closest, semantically speaking, to what OP is trying to do. HOWEVER: Beware RAII and the whims of stack-allocated objects! If this method exits, the memory used here will become invalid, and you'll access invalid memory, which is why I would strenuously argue against this kind of solution. – Xirema Dec 08 '15 at 15:52
1

True. The OP's question was out of the context where he intends to use the code fragment. I assumed the OP knows, as a C programmer, about the topic of dangling pointers and memory management. I had the same thought as you, that's why I added the note that one has to be careful that `myvector` outlives `ints`. – Florian Kaufmann Dec 08 '15 at 15:58
1

taking an address of contents of a vector is kinda dangerous, since vector automatically can move the contents of the vector to another location in memory, and you end up with dangling pointers. Usually such pointers should be short-lived, so that the pointer is only used only short amount of time. – tp1 Dec 08 '15 at 15:58
@tp1 That too. That might actually be the bigger issue, though based on the OP's use case, it may not matter as much. – Xirema Dec 08 '15 at 16:04

score 0 · Answer 4 · answered Dec 08 '15 at 15:42

What you are doing here in C is that allocating an array externally to your struct, and than pointing pointers to the different parts of this array.

You can do exactly the same thing with std::vector<> - have a huge vector defined outside of your struct, and point pointers to different parts of this vector. Same thing exactly.

ams · Answer 5 · 2015-12-08T16:37:34.230

If N and K are known at compile time, but can be different in different places, then a template will work:

template <int N, int K>
struct Memory {
  Memory() {
    for (int i=0; i < K; i++) {
      mystruct[i].array1 = data1[i];
      mystruct[i].array2 = data2[i];
      size[i] = N;
    }
  }

  struct mystruct {
    int * array1;
    int * array2;
    size_t size;
  } mystructs[K];

  int data1[K][N];
  int data2[K][N];
};

void f() {
  // The constructor sets up all the pointers.
  Memory *m<100,200> = new Memory<100,200>();

  .....
}

(I've not checked if that compiles.)

If the values are not known then I would not attempt to do this in one allocation; it makes more sense to do two allocations, one for an array of mystruct, and one for the integers. The extra overhead is minimal, and the code is much more maintainable.

struct Memory {
  Memory(int N, int K) {
    mystructs = new mystruct[K];
    data = new int[2*K*N];

    for (int i=0; i < K; i++) {
      array1[i] = &data1[2*i*N];
      array2[i] = &data2[(2*i+1)*N];
      size[i] = N;
    }
  }

  struct mystruct {
    int * array1;
    int * array2;
    size_t size;
  } mystruct *mystructs;

  int *data;
};

(Again, I've not checked that compiles.)

Note that where your code has 2*i*N*sizeof(int) you have a bug because C pointer arithmetic does not count bytes; it counts multiples of the pointer type. In my code I've made this explicit by taking the address of an array item, but the maths is the same.

Xirema · Answer 6 · 2015-12-08T16:53:17.430

What you're trying to do can be done using the exact same code in c++.

However, it is utterly inadvisable in c++. The reason c++ has Object-Oriented semantics is to avoid the very situation you're reckoning with. Here's how I would handle this:

struct mystruct {
    vector<int> array1;
    vector<int> array2;
    mystruct(size_t size);
}

mystruct::mystruct(size_t size) {
    array1.resize(size);
    array2.resize(size);
}

int main() {
    vector<mystruct> mystructarray(numOfStructs, numOfElementsOfArray1AndArray2);
    //EDIT: You don't need to expressly call the mystruct constructor, it'll be implicitly called with the variable passed into the vector constructor.
    //Do whatever
    return 0;
}

vector objects can be queried for their size at runtime, so there's no need to store size as a field of mystruct. And since you can define constructors for structs, it's best to handle creation of the object in that way. Finally, with a valid constructor, you can initialize an array of mystruct with a vector, passing in a valid argument for mystruct's constructor to build the vector.

DOUBLE EDIT COMBO: Alright, let's try a different approach.

Based on what you indicated in your comments, it sounds like you need to allocate a LOT of memory. I'm thinking this data has specific meaning in your application, which means it doesn't make a lot of sense to use generic data structures for your data. So here's what I'm proposing:

class mydata {
private:
    size_t num_of_sets;
    size_t size_of_arrays;

    std::vector<int> data;

public:
    mydata(size_t _sets, size_t _arrays)
        : data(_sets * _arrays * 2),
        num_of_sets(_sets),
        size_of_arrays(_arrays) {}

    int * const array1(size_t);
    int * const array2(size_t);
};

int * const mydata::array1(size_t index)
{
    return &(data[index*size_of_arrays * 2]);
}

int * const mydata::array2(size_t index)
{
    return &(data[index*size_of_arrays * 2 + size_of_arrays]);
}

int main(int argc, char** argv) {
    mydata data(16'777'216, 10);

    data.array1(5)[5] = 7;
    data.array2(7)[2] = 8;

    std::cout << "Value of index 5's array1 at index 5: " << data.array1(5)[5] << std::endl;
    std::cout << "Value of index 7's array2 at index 2: " << data.array2(7)[2] << std::endl;
    //Do Something
    return 0;
}

In this version, doesn't the mystruct constructor get called for each entry of the vector mystructarray? — V. Delecroix, Dec 08 '15 at 15:39
Yes. Which is how you should be doing it. The way you're doing memory allocation is terrible to maintain for future programmers, and only yields marginal benefits to certain optimizations in memory use/performance. — Xirema, Dec 08 '15 at 15:47
What you propose is more than 10x slower which is not acceptable for my application. And this is precisely what I asked: a fast and elegant solution. — V. Delecroix, Dec 08 '15 at 15:58
@V.Delecroix: Fast is not equal to elegant. At least, not always. — DevSolar, Dec 08 '15 at 16:01
Well, as I already said: you can use exactly the same code in C++ as you were using in C. There's nothing stopping you. But you will run into the problem of the code being much more difficult to maintain than it should be which is the #1 cause of bugs in any programming language. — Xirema, Dec 08 '15 at 16:06
@DevSolar: True. I am just trying to investigate whether it is possible to do custom memory allocation in C++... — V. Delecroix, Dec 08 '15 at 16:08
@V.Delecroix: C++ is a *beast* of a language; you can do pretty much whatever you please. You could, for example, write an [`allocator class`](http://en.cppreference.com/w/cpp/memory/allocator) that does your memory allocation up front, then pass that class as additional parameter to `std::vector` (e.g. in code similar to my answer) -- having the memory parcelled out by your allocator class, combining performance with elegance of code. It's just a bit more involved than usually done for SO answers. — DevSolar, Dec 08 '15 at 16:22
@V.Delecroix I've written a new version. Let me know if it suits your purpose or not. — Xirema, Dec 08 '15 at 16:53

C/C++ allocation of arrays of array like objects

6 Answers6