cudaMemcpy() vs cudaMemcpyFromSymbol()

Question

I'm trying to figure out why cudaMemcpyFromSymbol() exists. It seems everything that 'symbol' func can do, the nonSymbol cmds can do.

The symbol func appears to make it easy for part of an array or index to be moved, but this could just as easily be done with the nonSymbol function. I suspect the nonSymbol approach will run faster as there is no symbol-lookup needed. (It is not clear if the symbol look up calculation is done at compile or run time.)

Why would I use cudaMemcpyFromSymbol() vs cudaMemcpy()?

I think you can only copy from constant memory using `cudaMemcpyFromSymbol()` but I'm not sure. — Soroosh Bateni, Feb 11 '13 at 17:27
@Soroosh129... It can also be used to copy from global `__device__` variables. — sgarizvi, Feb 11 '13 at 17:38

score 12 · Accepted Answer · answered Feb 11 '13 at 17:46

12

cudaMemcpyFromSymbol is the canonical way to copy from any statically defined variable in device memory.

cudaMemcpy can't be directly use to copy to or from a statically defined device variable because it requires a device pointer, and that isn't known to host code at runtime. Therefore, an API call which can interrogate the device context symbol table is required. The two choices are either, cudaMemcpyFromSymbol which does the symbol lookup and copy in one operation, or cudaGetSymbolAddress which returns an address which can be passed to cudaMemcpy. The former is probably more efficient if you only want to do one copy, the latter if you want to use the address multiple times in host code.

answered Feb 11 '13 at 17:46

talonmies

70,661
34
192
269

2

When defining a variable as `__device__`, two versions of the variable are defined. One on host and one on the device. Taking the address of the variable with `&` in host code, yields the address of the host version. `cudaGetSymbolAddress()` yields the address of the device version. The `__device__` decorator can only be used on statically defined variables. – Roger Dahl Feb 12 '13 at 00:02
2

@RogerDahl: It is probably better to say that *any* statically defined device symbol (\_\_device\_\_, \_\_constant\_\_, even textures) results in the toolchain emitting two *symbols*, one in the device module, the other in the host object. The CUDA runtime sets up and maintains a dynamic mapping between these two symbols. The symbol API calls are the way of retrieving this mapping for \_\_constant\_\_ and \_\_device\_\_ symbols. The texture APIs retrieve the mapping for the texture symbols, etc. – talonmies Feb 12 '13 at 08:17

cudaMemcpy() vs cudaMemcpyFromSymbol()

1 Answers1

Linked