CUDA Thrust memory allocation issue

Question

I have a Thrust code which loads a big array of data (2.4G) into memory, perform calculations which results are stored in host (~1.5G), then frees the initital data, load the results into device, perform other calculations on it, and finally reloads the initial data. The thrust code looks like this:

thrust::host_device<float> hostData;
// here is a code which loads ~2.4G of data into hostData
thrust::device_vector<float> deviceData = hostData;
thrust::host_vector<float> hostResult;
// here is a code which perform calculations on deviceData and copies the result to hostResult (~1.5G)
free<thrust::device_vector<float> >(deviceData);
thrust::device_vector<float> deviceResult = hostResult;
// here is code which performs calculations on deviceResult and store some results also on the device
free<thrust::device_vector<float> >(deviceResult);
deviceData = hostData;

With my defined function free:

template<class T> void free(T &V) {
    V.clear();
    V.shrink_to_fit();
    size_t mem_tot;
    size_t mem_free;
    cudaMemGetInfo(&mem_free, &mem_tot);
    std::cout << "Free memory : " << mem_free << std::endl;
}

template void free<thrust::device_vector<int> >(thrust::device_vector<int>& V);
template void free<thrust::device_vector<float> >(
    thrust::device_vector<float>& V);

However, I get a "thrust::system::detail::bad_alloc' what(): std::bad_alloc: out of memory" error when trying to copy hostData back to deviceData even though cudaMemGetInfo returns that at this point I have ~6G of free memory of my device. Here is the complete output from the free method:

Free memory : 6295650304
Free memory : 6063775744
terminate called after throwing an instance of 'thrust::system::detail::bad_alloc'
what():  std::bad_alloc: out of memory

It seems to indicate that the device is out of memory although there is plenty free. Is it the right way to free memory for Thrust vectors? I should also note that the code works well for a smaller size of data (up to 1.5G)

What operating system, driver and GPU are you using. If this is Windows Vista/7/8, do you have TCC mode enabled? — talonmies, Jul 26 '13 at 09:16
I use a GTX Titan on Ubuntu 12.04 with CUDA 5 and driver 310.40 — Namux, Jul 26 '13 at 09:19

score 2 · Accepted Answer · answered Jul 26 '13 at 13:25

It would be useful to see a complete, compilable reproducer code. However you're probably running into memory fragmentation.

Even though a large amount of memory may be reported as being free, it's possible that it can't be allocated in a single large contiguous chunk. This fragmentation will then limit the maximum size of a single allocation that you can request.

It's probably not really a question of how you are freeing memory, but more a function of what overhead allocations remain after you free the memory. The fact that you are checking the mem info and getting a large number back says to me that you are freeing your allocations correctly.

To try to work around this, one approach would be to manage and re-use your allocations carefully. For instance, if you need a large 2.4G working device vector of float on the device, then allocate that once, and re-use it for successive operations. Also, if you have any remaining allocations on the device immediately before you are trying to re-allocate the 2.4G vector, then try freeing those (i.e. free all allocations you have made on the device) before trying to re-allocate the 2.4G vector.

Reusing the 2.4G vector through the code worked this around. Thanks! — Namux, Jul 29 '13 at 14:11

score 1 · Answer 2 · answered Aug 12 '17 at 07:55

I'm providing this answer as I came across this question when searching for answers to the same error message / problem.

Robert Crovella's excellent answer is certainly correct, however, it may be useful for others to know that when creating/requesting a device_vector the capacity of the device_vector allocated is far greater than the size of the device_vector requested.

This answer : Understanding Thrust (CUDA) memory usage, explains in much better detail why Thrust behaves in this way.

In my case, on Ubuntu 16.04, Quadro K1200, CUDA toolkit 8.0, requesting a device_vector of size 67108864 (doubles) resulted in a device_vector with a capacity 8x larger (536870912) being allocated.

Requested (R) | Capacity (C)  | Total Mem  | Free Mem   | C/Free   | R/C
67108864      | 536870912     | 4238540800 | 3137077248 | 0.171137 | 0.125

The output above was from modifying some very helpful code in the answer I linked to.

CUDA Thrust memory allocation issue

2 Answers2