0

When I use cudaMalloc (100) it reserves more than 100 B (According to some users here it's due to granularity issues and housekeeping information.

Is it possible to determine how big this space will be based on the Bytes I need to reserve?

Thank you so much.

EDIT: I'll explain why I need to know.

I want to apply the convolution algorithm over huge images on the GPU. To do so, since there isn't enough memory on the GPU to hold it, I need to split the image in batches of rows an call the kernel several times.

In fact, I need to send 2 images, the OnlyRead matrix and the Results matrix.

I want to calcule a priori the max number of rows I can send to the device according to the amount of free memory.

The first cudaMalloc executes successfully, but the problem appears when trying to execute the second CudaMalloc since the first reserve took more Bytes than expected.

What I'm doing now is considering the free memory amount a 10% less than what it is... but that's just a magical number that came from nowhere..

ThatBlairGuy
  • 2,426
  • 1
  • 19
  • 33
Bravado
  • 137
  • 9
  • 5
    [This](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#device-memory-accesses) may be instructive: "Any address of a variable residing in global memory or returned by one of the memory allocation routines from the driver or runtime API is always aligned to at least 256 bytes." Therefore I would expect any allocation request to at least "use up" to the next higher even multiple of 256 bytes. Having said that, AFAIK the answer to your question is not explicitly specified, so therefore trying to discover what it is and then depending on it could be risky. – Robert Crovella Nov 08 '14 at 17:53
  • 1
    @RobertCrovella may I know why an "even multiple of 256 bytes"? Why not at least use up to the next multiple of 256? – Farzad Nov 08 '14 at 18:01
  • Also [this post](http://stackoverflow.com/questions/14082964/cuda-alignment-256bytes-seriously) is relevant. – Farzad Nov 08 '14 at 18:01
  • 3
    Sorry, I meant an "integer" multiple. I used "even multiple" to refer to a non-fractional multiple, but I should have said "whole number multiple" or "integer multiple" – Robert Crovella Nov 08 '14 at 18:16
  • As I understood it, if I reserve 100B cudaMalloc will 256B instead, right? Apparently the texture alignment of my device is 512, So I would rather expect 512B instead of 100. However, when reserving 100B I get this: 535207936 <-- free memory in Bytes before cudaMalloc(100), 534159360 <-- free memory after It seems it reserves 1024*1024 – Bravado Nov 09 '14 at 15:29
  • 1
    Why do you want to know? Is it because you are doing many small allocations, and don't want the overhead? If so, write your own allocator. If you're trying to develop code that relies on the runtime's behavior, STOP NOW and seek another approach. – ArchaeaSoftware Nov 10 '14 at 04:25

1 Answers1

1

"Is there a way to know what's the extra space that cudaMalloc is going to reserve?"

Not without violating CUDA's platform guarantees, no. cudaMalloc() returns a pointer to the requested amount of memory. You can't make any assumptions about the amount of memory that happens to be valid after the end of the requested amount - the CUDA allocator already makes use of suballocators, and unlike CPU-based memory allocators, the data structures to track free lists etc. are not interleaved with the allocated memory. So for example, it would be unwise to assume that the CUDA runtime's guarantees about the alignment of the returned pointers mean anything other than that returned pointers will have a certain alignment.

If you study the CUDA runtime's behavior, that will shed light on the behavior of that particular CUDA runtime, but the behavior may change with future releases and break your code.

ArchaeaSoftware
  • 4,332
  • 16
  • 21