How can I specify a minimum compute capability to the mexcuda compiler to compile a mexfunction?

Question

I have a CUDA project in a .cu file that I would like to compile to a .mex file using mexcuda. Because my code makes use of the 64-bit floating point atomic operation atomicAdd(double *, double), which is only supposed for GPU devices of compute capability 6.0 or higher, I need to specify this as a flag when I am compiling.

In my standard IDE, this works fine, but when compiling with mexcuda, this is not working as I would like. In this post on MathWorks, it was suggested to use the following command (edited from the comment by Joss Knight):

mexcuda('-v', 'mexGPUExample.cu', 'NVCCFLAGS=-gencode=arch=compute_60,code=sm_60')

but when I use this command on my file, the verbose option spits out the following line last:

Building with 'NVIDIA CUDA Compiler'.
nvcc -c --compiler-options=/Zp8,/GR,/W3,/EHs,/nologo,/MD - 
gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_50,code=sm_50 - 
gencode=arch=compute_60,code=sm_60 - 
gencode=arch=compute_70,code=\"sm_70,compute_70\"

(and so on), which signals to me that the specified flag was not passed to the nvcc properly. And indeed, compilation fails with the following error:

C:/path/mexGPUExample.cu(35): error: no instance of overloaded function "atomicAdd" matches 
the argument list. Argument types are: (double *, double)

The only other post I could find on this topic was this post on SO, but it is almost three years old and seemed to me more like a workaround - one which I do not understand even after some research, otherwise I would have tried it - rather than a true solution to the problem.

Is there a setting I missed, or can this simply not be done without a workaround?

The linked post only is for `mex`, and not `mexcuda`. Also as the default is compiling to all architectures, it should not be a problem to pass that flag. The error is caused by something else, not lack of flags. — Ander Biguri, Oct 10 '18 at 15:04
@AnderBiguri you will get that error if you compile with compute_30 and attempt to use double atomicAdd. It is not OK to compile for all architectures if you use double atomicAdd in your code. And it does not mean that it is bad code. It does mean that you should not compile for compute_30, and OP is asking how to do that. — Robert Crovella, Oct 10 '18 at 16:00
@RobertCrovella well, it could have been bad code, but you are right. Perhaps I jut don't know enough, but one would guess that compiling for all arquitectures means that code is generated that works in all arquitectures. Not more limits, but less. My only suggestion to OP is to use the `mex` + `xml` version of the compilation, rather than `mexcuda`. With this last one its just harder to tune the parameters, in my experience. — Ander Biguri, Oct 10 '18 at 18:37
generally your intuitiion is correct. But if you use a specific feature that is not available in a lower architecture, you cannot use that architecture. For example warp shuffle is a cc3.0 feature. You cannot compile warp shuffle intrinsics for a cc2.0 target. You will get a compile error. — Robert Crovella, Oct 10 '18 at 18:45
In a similar fashion, `double` `atomicAdd` is a feature that requires cc6.0 or higher. You cannot compile it for a cc5.0 or lower architecture. if you attempt to do so, even if you specify multiple architectures, you will get a compile error. Therefore, those lower architectures must be removed, which is the gist of this question, as it applies to the mexcuda toolchain. — Robert Crovella, Oct 10 '18 at 18:45
Thanks for your comments, gentlemen. I have managed to find a workaround modifying the `xml` file in the MatLab folder, which I will post as an answer for others to see. It's not elegant, but it works for me at least. I will look into the `mex` + `xml` suggestion as well, and edit that in my answer should I find anything that works. I will also see if I can notify someone from MathWorks of this 'issue'. — Floris, Oct 11 '18 at 08:15
@Ander, with regard to your third comment: you are correct w.r.t. compiling for all architectures. Compiling for all architectures will result in a larger file size for the `.mex` function file, because it will contain separate code for each and use the appropriate one depending on the architecture it was called off of, ensuring better portability. My compiled function will not work on any gpu with lower compute capability (it shouldn't, at least..) — Floris, Oct 11 '18 at 08:33
@RobertCrovella it is certainly a fair behavior. Thanks for teaching us (me only perhaps), as usual :) — Ander Biguri, Oct 11 '18 at 08:36

Floris · Answer 1 · 2018-10-12T08:59:39.253

I was able to work my way around this problem after some messing around with the standard xml-files in the MatLab folder. The following steps allowed me to compile using -mexcuda:

-1) Go to the folder C:\Program Files\MATLAB\-version-\toolbox\distcomp\gpu\extern\src\mex\win64, which contains xml-files for different versions of msvcpp;

-2) Make a backup of the file that corresponds to the version you are using. In my case, I made a copy of the file nvcc_msvcpp2017 and named it nvcc_msvcpp2017_old, to always have the original.

-3) Open nvcc_msvcppYEAR with notepad, and scroll to the following block of lines:

COMPILER="nvcc"
COMPFLAGS="--compiler-options=/Zp8,/GR,/W3,/EHs,/nologo,/MD $ARCHFLAGS"
ARCHFLAGS="-gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=&#92;&quot;sm_70,compute_70&#92;&quot; $NVCC_FLAGS"
COMPDEFINES="--compiler-options=/D_CRT_SECURE_NO_DEPRECATE,/D_SCL_SECURE_NO_DEPRECATE,/D_SECURE_SCL=0,$MATLABMEX"
MATLABMEX="/DMATLAB_MEX_FILE"
OPTIMFLAGS="--compiler-options=/O2,/Oy-,/DNDEBUG"
INCLUDE="-I&quot;$MATLABROOT\extern\include&quot; -I&quot;$MATLABROOT\simulink\include&quot;"
DEBUGFLAGS="--compiler-options=/Z7"

-4) Remove the architectures that will not allow your code to compile, i.e. all the architecture flags below 60 in my case:

ARCHFLAGS="-gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=&#92;&quot;sm_70,compute_70&#92;&quot; $NVCC_FLAGS"

-5) I was able to compile using mexcuda after this. You do not need to specify any architecture flags in the mexcuda call.

-6) (optional) I suppose you want to revert this change after you are done with the project that required you to make this change, if you want to ensure maximum portability of the code you will compile after this.

Note: you will need administrator permission to make these changes.

As a curiosity: When using `mex`, instead of `mexcuda`, the way you do this to avoid screwing MATLAB files is to copy paste the xml file and put it in your current folder. That way, when `mex` is called, instead of looking for its own `xml` file in the MATLAB path, it grabs the one in the current working folder. Can you do this with `mexcuda`? — Ander Biguri, Oct 11 '18 at 08:38
Do you alter your call to `mex` in any way? Without it, if I place the original `xml` file in my current working folder, it still defaults to the file in the folder in step 1) of my answer above. — Floris, Oct 11 '18 at 08:53
Yes I have altered the `xml` file and certainly the `mex` chooses the current folder one before the one in the MATLAB path. `mexcuda` maybe works differently. — Ander Biguri, Oct 11 '18 at 08:54

How can I specify a minimum compute capability to the mexcuda compiler to compile a mexfunction?

1 Answers1

Linked

Related