PyCUDA: Querying Device Status (Memory specifically)

Question

PyCUDA's documentation mentions Driver Interface calls in passing, but I'm a bit think and can't see how to get information such as 'SHARED_SIZE_BYTES' out of my code.

Can anyone point me to any examples of querying the device in this way?

Is it possible to / How do I check the device state (eg between malloc/memcpy and kernel launch) to implement some machine-dynamic operations? (I want to be able to deal with devices that support multiple kernels in a 'friendly' way.

Are you asking about the attribute fields given in pycuda.driver.function_attribute ? Those are not device attributes, they are per compiled function. — talonmies, Apr 20 '11 at 12:22

score 21 · Accepted Answer · edited Oct 16 '18 at 18:23

Just for anyone else coming across this, spending half an hour with the CUDA API in one hand, and the PyCUDA documentation in another does wonders. Its much simpler than my initial experiments indicated.

Runtime Kernel Info

Incoming lazy lazy code

...
kernel=mod.get_function("foo")
meminfo(kernel)
...
def meminfo(kernel):
    shared=kernel.shared_size_bytes
    regs=kernel.num_regs
    local=kernel.local_size_bytes
    const=kernel.const_size_bytes
    mbpt=kernel.max_threads_per_block
    print("=MEM=\nLocal:%d,\nShared:%d,\nRegisters:%d,\nConst:%d,\nMax Threads/B:%d" % (local,shared,regs,const,mbpt))

Example Output

=MEM=
Local:24,
Shared:64,
Registers:18,
Const:0,
Max Threads/B:512

Static Device Info

Incoming lazy lazy code

import pycuda.autoinit
import pycuda.driver as cuda

(free,total)=cuda.mem_get_info()
print("Global memory occupancy:%f%% free"%(free*100/total))

for devicenum in range(cuda.Device.count()):
    device=cuda.Device(devicenum)
    attrs=device.get_attributes()

    #Beyond this point is just pretty printing
    print("\n===Attributes for device %d"%devicenum)
    for (key,value) in attrs.iteritems():
        print("%s:%s"%(str(key),str(value)))

Example Output

Global memory occupancy:70.000000% free

===Attributes for device 0
MAX_THREADS_PER_BLOCK:512
MAX_BLOCK_DIM_X:512
MAX_BLOCK_DIM_Y:512
MAX_BLOCK_DIM_Z:64
MAX_GRID_DIM_X:65535
MAX_GRID_DIM_Y:65535
MAX_GRID_DIM_Z:1
MAX_SHARED_MEMORY_PER_BLOCK:16384
TOTAL_CONSTANT_MEMORY:65536
WARP_SIZE:32
MAX_PITCH:2147483647
MAX_REGISTERS_PER_BLOCK:8192
CLOCK_RATE:1500000
TEXTURE_ALIGNMENT:256
GPU_OVERLAP:1
MULTIPROCESSOR_COUNT:14
KERNEL_EXEC_TIMEOUT:1
INTEGRATED:0
CAN_MAP_HOST_MEMORY:1
COMPUTE_MODE:DEFAULT
MAXIMUM_TEXTURE1D_WIDTH:8192
MAXIMUM_TEXTURE2D_WIDTH:65536
MAXIMUM_TEXTURE2D_HEIGHT:32768
MAXIMUM_TEXTURE3D_WIDTH:2048
MAXIMUM_TEXTURE3D_HEIGHT:2048
MAXIMUM_TEXTURE3D_DEPTH:2048
MAXIMUM_TEXTURE2D_ARRAY_WIDTH:8192
MAXIMUM_TEXTURE2D_ARRAY_HEIGHT:8192
MAXIMUM_TEXTURE2D_ARRAY_NUMSLICES:512
SURFACE_ALIGNMENT:256
CONCURRENT_KERNELS:0
ECC_ENABLED:0
PCI_BUS_ID:1
PCI_DEVICE_ID:0
TCC_DRIVER:0

That is a device query output, but in your original question you were asking about code properties. Which are you actually interested in? — talonmies, Apr 20 '11 at 17:32

PyCUDA: Querying Device Status (Memory specifically)

1 Answers1