1

I want to create array for huge amount of data (eg. ints). This array will be representation of 2D matrix.
I cannot use STL as it will be run with CUDA.
I'm wondering what are the pros and cons of following options:

  • int arr[SIZE] - this is the simplest way to create an array. It's allocated on stack so it will be the fastest one - the problem here is that has very limited size.
  • int* arr = new int[SIZE]
  • int** arr = new int*[DIM1] - that's the worst case if we look on efficiency, but it allows to store INT_MAX * INT_MAX values.

I was considering second option. Sizeof(int) on my computer is 32 bits. I think that it may be too small for some test cases (if I use matrices bigger than 32k x 32k).

The third option seems to be the most flexible, but I heard that's not the good practice.

Is there another option for creating arrays like that ( > 1B elements)? Is it possible to create one dimensional array with bigger than INT_MAX/2 length?

Szymon Żak
  • 491
  • 1
  • 3
  • 17
  • 3
    The argument (n) to a `new int[n];` operation, is a `size_t` type (not `int`), if that helps at all (not sure it makes a huge difference on CUDA, though). – Adrian Mole May 13 '20 at 15:57
  • 6
    "*I cannot use STL as it will be run with CUDA.*" Um, why not? You'll have to copy the array into CUDA's memory in order for it to work with it, right? What does it matter if the allocation for your CPU copy comes from `std::vector` or something else? – Nicol Bolas May 13 '20 at 15:59
  • You can look at sparse matrix it can help. – Katsu May 13 '20 at 15:59
  • 3
    "_I was considering second option. Sizeof(int) on my computer is 32 bits. I think that it may be too small for some test cases (if I use matrices bigger than 32k x 32k)._" 32 bits (`unsigned int`) would allow for matrices of size 2^32 x 2^32, not 32k x 32k. – Algirdas Preidžius May 13 '20 at 15:59
  • 3
    *I cannot use STL* -- You may not be aware, but that `new int[]` is what is being done by `vector`. The `std::vector<>::data()` function gets you a pointer to that buffer. – PaulMcKenzie May 13 '20 at 16:01
  • 1
    @SzymonŻak: It should be noted that 2^32 represents 4 *giga*-integers. Your CUDA device would need 16 GB of storage for them. – Nicol Bolas May 13 '20 at 16:03
  • @AlgirdasPreidžius if I create one dimension array with new int[], it's max size is 2^32 (which means it's 2^16 * 2^16). I thought that is a int, not an unsigned int. So the max size is 65k x 65k, yes? – Szymon Żak May 13 '20 at 16:04
  • @NicolBolas yes, your's idea is ok, but I usually create int* with cudaMallocManaged and then fill it from file. So std::vector won't help here because I don't even have to use it. – Szymon Żak May 13 '20 at 16:07
  • 2
    @SzymonŻak "_I thought that is a int, not an unsigned int._" Why would one think that? What would the sense be in allowing negatively-sized arrays? As already outlined in the comments, the actual size is a `size_t` type. Which can be larger than `int`, or `unsigned int`, but is definitely `unsigned`. – Algirdas Preidžius May 13 '20 at 16:07
  • @AlgirdasPreidžius actually, I don't have a big experience with C++. As I looked on internet, there was a answers that max size for pointer is INT_MAX. – Szymon Żak May 13 '20 at 16:09
  • 1
    @SzymonŻak "_As I looked on internet, there was a answers that max size for pointer is INT_MAX._" So you didn't see this question, while looking "on the internet": [Is there a max array length limit in C++?](https://stackoverflow.com/questions/216259/is-there-a-max-array-length-limit-in-c)? – Algirdas Preidžius May 13 '20 at 16:19
  • @AlgirdasPreidžius you're right, sorry. – Szymon Żak May 13 '20 at 16:24
  • 1
    Check out [Thrust library](https://thrust.github.io/), which is a C++ library that supports CUDA. It will probably have container types for matrices already. If not, you can give a look at [Armadillo C++](http://arma.sourceforge.net/). – Jorge Bellon May 13 '20 at 16:28
  • 1
    @SzymonŻak: "*I usually create int* with cudaMallocManaged and then fill it from file.*" But *none* of the syntaxes you propose will work with that. – Nicol Bolas May 13 '20 at 17:32
  • @NicolBolas So how would you create it? I just set with std::cin >> arr[i][j]. – Szymon Żak May 14 '20 at 09:24
  • @SzymonŻak: "*So how would you create it?*" ... I don't know what you're talking about. `new int[n]` allocates memory. `int name[n]` allocates memory. `cudaMallocManaged` *also* allocates memory. They obviously allocate memory in different ways, but if you're going to use the latter one, then you can't use the former two. – Nicol Bolas May 14 '20 at 13:25
  • @NicolBolas yeah, I create it with one of these, for example let's just go with cudaMallocManaged. I allocate memory with it and then fill it from std::cin. – Szymon Żak May 16 '20 at 19:07

1 Answers1

0

How to create a huge array in modern C++?

Using dynamic allocation1. In standard, hosted C++, it would typically be created using std::vector.

I cannot use STL as it will be run with CUDA.

Do you mean the standard library? In this case, the solution is to use another container that can run with CUDA.

int* arr = new int[SIZE]

Are you sure that the use of new[] is not limited for same reason that the use of containers is limited? I would expect that new[] can't be an option if containers aren't. It is unclear what your limitations are exactly.

Regardless, you shouldn't use bare owning pointers. Use a container or at least a smart pointer.

int** arr = new int*[DIM1]

This can be useful (ignoring the use of bare pointer for ownership which is bad) if you need to have rows of different lengths or need to swap rows. Otherwise this has no advantages over a single dimensional array.

Is it possible to create one dimensional array with bigger than INT_MAX/2 length?

Assuming 32 bit int, INT_MAX/2 would be one gigabyte. That is no problem on a 64 bit system as long as you have the memory available. On a 32 bit system, maybe just barely but you'll probably not have sufficient memory.

but I usually create int* with cudaMallocManaged and then fill it from file. So std::vector won't help here because I don't even have to use it

If you use cudaMallocManaged, then you wouldn't be using new[] so it wouldn't help here either.


1 Unless the array needs to be so huge that it cannot be kept in memory as a whole, in which it would have to be created as a file on the file system.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Yes, I meant standard library. std::vector and std::array cannot be create on CUDA (unless we use thrust). int[] and int[][] are limited as in standard C++. Actually, it's good idea that I put everything in vector and then copy it to CUDA with std::vector::data(). However, I usually create arrays on CUDA with: int* arr; cudaMallocManaged(&arr, SIZE * sizeof(int)); and then fill it from STDIN or file. – Szymon Żak May 13 '20 at 16:18
  • 1
    You could create your own allocator to call `cudaMallocManaged` under the hood, and then declare your vector with `std::vector>`. – Jorge Bellon May 13 '20 at 16:30