22

I am working on compiling some CUDA kernels on a Windows system. From my understanding, the nvcc compiler requires the use of cl.exe to compile on Windows systems. The primary way to get this is with Visual Studio. I have therefore installed the free community edition. After which I expected there to be the bin directory within the VC directory as shown in multiple other questions such as this one and this one. And yet, I need to go to several layers deeper to find

C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.10.25017\bin\HostX64\x64\cl.exe

This particular project is intended to make a program that can be compiled and used on multiple different Windows systems. Do I really need to expect the cl.exe file to be this nested or did I miss some sort of installation step here? I was expecting a shorter path:

C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\bin\

Ultimately I need as simple a way as possible for users to be able to have their environment find the cl.exe file. Generally this involves (at the highest level) setting an environmental variable.

Community
  • 1
  • 1
cdeterman
  • 19,630
  • 7
  • 76
  • 100

5 Answers5

16

I had this problem in a different context (Elixir/Phoenix, Rust), but the root cause was the same: cl.exe could not be found during compilation.

My setup was:

  • Windows 10, x64
  • Visual Studio Community 2017 already installed, but only for C# development

For some reason the solution with installing the Visual C++ Build Tools (as @cozzamara suggested) did not work. Stops during installation with some obscure error message. Guess it did not liked my existing Visual Studio installation.

This is how I solved it:

  1. Start up the Visual Studio Installer
  2. Check the Desktop development with C++ (screenshots here)
  3. Execute following command before compiling:

    C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build\vcvars64.bat
    

    From this on the command cl.exe works. Alternatively (and more conveniently for development) start the application 'Developer Command Prompt for VS 2017' or 'x64 Native Tools Command Prompt VS 2017'.

Theo Lenndorff
  • 4,556
  • 3
  • 28
  • 43
  • Thank you for the answer. My set up was identical to yours and I ran into the same problem. Your solution worked! – tomosius Jul 07 '18 at 01:02
  • I installed via choco just the build tools, no visual c++: `choco install microsoft-build-tools`. Then I call the developer command prompt for vs 2017 and ... buff .. no `cl`. – Timo Oct 09 '20 at 18:56
9

Look for VCVARSALL.BAT -- that's usually at a higher level. If you run that it sets up your environment so that you can just call CL without a path.

Documentation here: https://msdn.microsoft.com/en-us/library/f2ccy3wt.aspx

Lou Franco
  • 87,846
  • 14
  • 132
  • 192
3

I tried Theo's solution of configuring Visual Studio but that did not work for me. I am running Visual Studio Community 2017 on Windows 10, CUDA Toolkit 10.0. To be precise, I went to C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Auxiliary\Build and ran vcvarsamd64_x86.bat. My PyCUDA was still failing to compile because cl.exe was not found.

I ended up creating a test CUDA project on Visual Studio 2017 ('File'--> 'New Project') and select the appropriate CUDA on the left. enter image description here

Then I built (Ctrl+Shift+B or go to 'Build'--> 'Build Solution') the example that showed up (which is a simple vector addition, replicated below).

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>

cudaError_t addWithCuda(int *c, const int *a, const int *b, unsigned int size);

__global__ void addKernel(int *c, const int *a, const int *b)
{
    int i = threadIdx.x;
    c[i] = a[i] + b[i];
}

int main()
{
    const int arraySize = 5;
    const int a[arraySize] = { 1, 2, 3, 4, 5 };
    const int b[arraySize] = { 10, 20, 30, 40, 50 };
    int c[arraySize] = { 0 };

    // Add vectors in parallel.
    cudaError_t cudaStatus = addWithCuda(c, a, b, arraySize);
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "addWithCuda failed!");
        return 1;
    }

    printf("{1,2,3,4,5} + {10,20,30,40,50} = {%d,%d,%d,%d,%d}\n",
        c[0], c[1], c[2], c[3], c[4]);

    // cudaDeviceReset must be called before exiting in order for profiling and
    // tracing tools such as Nsight and Visual Profiler to show complete traces.
    cudaStatus = cudaDeviceReset();
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaDeviceReset failed!");
        return 1;
    }

    return 0;
}

// Helper function for using CUDA to add vectors in parallel.
cudaError_t addWithCuda(int *c, const int *a, const int *b, unsigned int size)
{
    int *dev_a = 0;
    int *dev_b = 0;
    int *dev_c = 0;
    cudaError_t cudaStatus;

    // Choose which GPU to run on, change this on a multi-GPU system.
    cudaStatus = cudaSetDevice(0);
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaSetDevice failed!  Do you have a CUDA-capable GPU installed?");
        goto Error;
    }

    // Allocate GPU buffers for three vectors (two input, one output)    .
    cudaStatus = cudaMalloc((void**)&dev_c, size * sizeof(int));
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaMalloc failed!");
        goto Error;
    }

    cudaStatus = cudaMalloc((void**)&dev_a, size * sizeof(int));
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaMalloc failed!");
        goto Error;
    }

    cudaStatus = cudaMalloc((void**)&dev_b, size * sizeof(int));
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaMalloc failed!");
        goto Error;
    }

    // Copy input vectors from host memory to GPU buffers.
    cudaStatus = cudaMemcpy(dev_a, a, size * sizeof(int), cudaMemcpyHostToDevice);
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaMemcpy failed!");
        goto Error;
    }

    cudaStatus = cudaMemcpy(dev_b, b, size * sizeof(int), cudaMemcpyHostToDevice);
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaMemcpy failed!");
        goto Error;
    }

    // Launch a kernel on the GPU with one thread for each element.
    addKernel<<<1, size>>>(dev_c, dev_a, dev_b);

    // Check for any errors launching the kernel
    cudaStatus = cudaGetLastError();
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "addKernel launch failed: %s\n", cudaGetErrorString(cudaStatus));
        goto Error;
    }

    // cudaDeviceSynchronize waits for the kernel to finish, and returns
    // any errors encountered during the launch.
    cudaStatus = cudaDeviceSynchronize();
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaDeviceSynchronize returned error code %d after launching addKernel!\n", cudaStatus);
        goto Error;
    }

    // Copy output vector from GPU buffer to host memory.
    cudaStatus = cudaMemcpy(c, dev_c, size * sizeof(int), cudaMemcpyDeviceToHost);
    if (cudaStatus != cudaSuccess) {
        fprintf(stderr, "cudaMemcpy failed!");
        goto Error;
    }

Error:
    cudaFree(dev_c);
    cudaFree(dev_a);
    cudaFree(dev_b);

    return cudaStatus;
}

When this build succeeded, I looked at the command that was used to run the build, which contained the following path : C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.15.26726\bin\HostX86\x64. I added this to the environment variable PATH and now my PyCUDA works! (And when I visited that path, I found a cl.exe)

TL;DR

Create and build a CUDA project with Visual Studio. Build it. Once it succeeds, look at build command and copy the path from there onto PATH.

Srikiran
  • 309
  • 1
  • 3
  • 9
  • 1
    THIS IS THE SOLUTION. After an hour of searching, running random .bat files and installing extensions, adding `C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.15.26726\bin\HostX86\x64` added CL.exe to my PATH for PROFESSIONAL. – Mattkwish Feb 15 '19 at 16:00
  • Lots of answers on SO and the internet saying not to add anything VS to the PATH, but nothing else worked for me. VS 2019, I added `C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.23.28105\bin\Hostx64\x64`, worked great – dazonic Nov 04 '19 at 07:25
1

I had a similar issue, where Visual Studio 2017 could not find CL.exe nor MIDL.exe for x64 configurations. The files where there, from the VS command prompt they could be found, but not when building from Visual Studio (but it did work for x86).

When I turned on the verbosity of build output to Diagnostics (Tools => Options => Project & Solutions => Build & Run => MSBuild project build output verbosity), I did notice that the PATH was not properly expanded in the 'SetEnv' build step for the x64. But how much I tried to re-install Visual Studio, individual components, sdk's, runtimes, registry cleaning, etc. , nothing solved it (I was almost about to reinstall Windows).

Bu then I found out that Visual Studio C++ projects may import 'user.props' files from your app data folder; it's this section in the project file:

<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="PropertySheets">
   <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>

The $(UserRootDir) evaluates on my PC to C:\Users[username]\AppData\Local\Microsoft\MSBuild\v4.0 where I found Microsofr.Cpp.xxx.user.props files. It were these files that had old paths (left overs of early installations and other tooling).

So the solution for me was to delete these prop files in my AppData folder.

mpjanus
  • 11
  • 2
0

I'm not sure why but the Path seems not to be updated. Try running your commands from the "Developer Command prompt for visual studio 2017".

Tomas Andersson
  • 154
  • 1
  • 6