I'm trying to use CUDA code inside MATLAB mex, under linux. With the "whole program compilation" mode, it works good for me. I take the following two steps inside Nsight:
(1) Add "-fPIC" as a compiler option to each .cpp or .cu file, then compile them separately, each producing a .o file.
(2) Set the linker command to be "mex" and add "-cxx" to indicate that the type of all the .o input files are cpp files, and add the library path for cuda. Also add a cpp file that contains the mexFunction entry as an additional input.
This works good and the resulted mex file runs well under MATLAB. After that when I need to use dynamical parallelism, I have to switch to the "separate compilation mode" in Nsight. I tried the same thing above but the linker produces a lot of errors of missing reference, which I wasn't able to resolve.
Then I checked the compilation and linking steps of the "separate compilation" mode. I got confused by what it is doing. It seems that Nsight does two compilation steps for each .cpp or .cu file and produces a .o file as well as a .d file. Like this:
/usr/local/cuda-5.5/bin/nvcc -O3 -gencode arch=compute_35,code=sm_35 -odir "src" -M -o "src/tn_matrix.d" "../src/tn_matrix.cu"
/usr/local/cuda-5.5/bin/nvcc --device-c -O3 -gencode arch=compute_35,code=compute_35 -gencode arch=compute_35,code=sm_35 -x cu -o "src/tn_matrix.o" "../src/tn_matrix.cu"
The linking command is like this:
/usr/local/cuda-5.5/bin/nvcc --cudart static --relocatable-device-code=true -gencode arch=compute_35,code=compute_35 -gencode arch=compute_35,code=sm_35 -link -o "test7" ./src/cu_base.o ./src/exp_bp_wsj_dev_mex.o ./src/tn_main.o ./src/tn_matlab_helper.o ./src/tn_matrix.o ./src/tn_matrix_lib_dev.o ./src/tn_matrix_lib_host.o ./src/tn_model_wsj_dev.o ./src/tn_model_wsj_host.o ./src/tn_utility.o -lcudadevrt -lmx -lcusparse -lcurand -lcublas
What's interesting is that the linker does not take the .d file as input. So I'm not sure how it dealt with these files and how I should process them with the "mex" command when linking?
Another problem is that the linking stage has a lot of options I don't understand (--cudart static --relocatable-device-code=true), which I guess is the reason why I cannot make it work like in the "whole program compilation" mode. So I tried the following:
(1) Compile in the same way as in the beginning of the post.
(2) Preserve the linking command as provided by Nsight but change to use "-shared" option, so that the linker produces a lib file.
(3) Invoke mex with input the lib file and another cpp file containing the mexFunction entry.
This way mex compilation works and it produces a mex executable as output. However, running the resulted mex executable under MATLAB produces a segmentation fault immediately and crashes MATLAB.
I'm not sure if this way of linking would cause any problem. More strangely, I found that the mex linking step seems to finish trivially without even checking the completeness of the executable, because even if I miss a .cpp file for some function that the mexFunction will use, it still compiles.
EDIT:
I figured out how to manually link into a mex executable which can run correctly under MATLAB, but I haven't figured out how to do that automatically under Nsight, which I can in the "whole program compilation" mode. Here is my approach:
(1) Exclude from build the cpp file which contains the mexFunction entry. Manually compile it with the command "mex -c".
(2) Add "-fPIC" as a compiler option to each of the rest .cpp or .cu file, then compile them separately, each producing a .o file.
(3) Linking will fail because it cannot find the main function. We don't have it since we use mexFunction and it is excluded. This doesn't matter and I just leave it there.
(4) Follow the method in the post below to manually dlink the .o files into a device object file
cuda shared library linking: undefined reference to cudaRegisterLinkedBinary
For example, if step (2) produces a.o and b.o, here we do
nvcc -gencode arch=compute_35,code=sm_35 -Xcompiler '-fPIC' -dlink a.o b.o -o mex_dev.o -lcudadevrt
Note that here the output file mex_dev.o
should not exist, otherwise the above command will fail.
(5) Use mex command to link all the .o files produced in step (2) and step (4), with all necessary libraries supplied.
This works and produces runnable mex executable. The reason I cannot automate step (1) inside Nsight is because if I change the compilation command to "mex", Nsight will also use this command to generate a dependency file (the .d file mentioned in the question text). And the reason I cannot automate step (4) and step (5) in Nsight is because it involves two commands, which I don't know how to put them in. Please let me know if you knows how to do these. Thanks!