GPU Emulator for CUDA programming without the hardware

Question

Question: Is there an emulator for a Geforce card that would allow me to program and test CUDA without having the actual hardware?

Info:

I'm looking to speed up a few simulations of mine in CUDA, but my problem is that I'm not always around my desktop for doing this development. I would like to do some work on my netbook instead, but my netbook doesn't have a GPU. Now as far as I know, you need a CUDA capable GPU to run CUDA. Is there a way to get around this? It would seem like the only way is a GPU emulator (which obviously would be painfully slow, but would work). But whatever way there is to do this I would like to hear.

I'm programming on Ubuntu 10.04 LTS.

Related: with AMD GPU: http://stackoverflow.com/questions/12828268/is-it-possible-to-run-cuda-on-amd-gpus , on Intel integrated graphics: http://stackoverflow.com/questions/8193242/can-i-run-cuda-on-intel — Ciro Santilli OurBigBook.com, Feb 16 '17 at 17:58

score 45 · Answer 1 · edited Sep 01 '16 at 12:18

This response may be too late, but it's worth noting anyway. GPU Ocelot (of which I am one of the core contributors) can be compiled without CUDA device drivers (libcuda.so) installed if you wish to use the Emulator or LLVM backends. I've demonstrated the emulator on systems without NVIDIA GPUs.

The emulator attempts to faithfully implement the PTX 1.4 and PTX 2.1 specifications which may include features older GPUs do not support. The LLVM translator strives for correct and efficient translation from PTX to x86 that will hopefully make CUDA an effective way of programming multicore CPUs as well as GPUs. -deviceemu has been a deprecated feature of CUDA for quite some time, but the LLVM translator has always been faster.

Additionally, several correctness checkers are built into the emulator to verify: aligned memory accesses, accesses to shared memory are properly synchronized, and global memory dereferencing accesses allocated regions of memory. We have also implemented a command-line interactive debugger inspired largely by gdb to single-step through CUDA kernels, set breakpoints and watchpoints, etc... These tools were specifically developed to expedite the debugging of CUDA programs; you may find them useful.

Sorry about the Linux-only aspect. We've started a Windows branch (as well as a Mac OS X port) but the engineering burden is already large enough to stress our research pursuits. If anyone has any time and interest, they may wish to help us provide support for Windows!

Hope this helps.

[1]: GPU Ocelot - https://code.google.com/archive/p/gpuocelot/
[2]: Ocelot Interactive Debugger - http://forums.nvidia.com/index.php?showtopic=174820

Hi - are you still around? Is there any documentation on how one builds a program with Ocelot on an existing CUDA build environment? Also, does Ocelot work with Thrust? — Kerrek SB, Jul 20 '11 at 13:27
More recent GPU Ocelot source code can be found via GitHub [gtcasl/gpuocelot](https://github.com/gtcasl/gpuocelot). — marc-medley, Nov 18 '17 at 08:00

Mateusz Piotrowski · Accepted Answer · 2017-03-12T12:44:47.843

For those who are seeking the answer in 2016 (and even 2017) ...

Disclaimer

I've failed to emulate GPU after all.
It might be possible to use gpuocelot if you satisfy its list of dependencies.

I've tried to get an emulator for BunsenLabs (Linux 3.16.0-4-686-pae #1 SMP Debian 3.16.7-ckt20-1+deb8u4 (2016-02-29) i686 GNU/Linux).

I'll tell you what I've learnt.

`nvcc` used to have a `-deviceemu` option back in CUDA Toolkit 3.0

I downloaded CUDA Toolkit 3.0, installed it and tried to run a simple program:

#include <stdio.h>

__global__ void helloWorld() {
    printf("Hello world! I am %d (Warp %d) from %d.\n",
        threadIdx.x, threadIdx.x / warpSize, blockIdx.x);
}

int main() {
    int blocks, threads;
    scanf("%d%d", &blocks, &threads);
    helloWorld<<<blocks, threads>>>();
    cudaDeviceSynchronize();
    return 0;
}

Note that in CUDA Toolkit 3.0 nvcc was in the /usr/local/cuda/bin/.

It turned out that I had difficulties with compiling it:

NOTE: device emulation mode is deprecated in this release
      and will be removed in a future release.

/usr/include/i386-linux-gnu/bits/byteswap.h(47): error: identifier "__builtin_bswap32" is undefined

/usr/include/i386-linux-gnu/bits/byteswap.h(111): error: identifier "__builtin_bswap64" is undefined

/home/user/Downloads/helloworld.cu(12): error: identifier "cudaDeviceSynchronize" is undefined

3 errors detected in the compilation of "/tmp/tmpxft_000011c2_00000000-4_helloworld.cpp1.ii".

I've found on the Internet that if I used gcc-4.2 or similarly ancient instead of gcc-4.9.2 the errors might disappear. I gave up.

gpuocelot

The answer by Stringer has a link to a very old gpuocelot project website. So at first I thought that the project was abandoned in 2012 or so. Actually, it was abandoned few years later.

Here are some up to date websites:
I tried to install gpuocelot following the guide. I had several errors during installation though and I gave up again. gpuocelot is no longer supported and depends on a set of very specific versions of libraries and software.

You might try to follow this tutorial from July, 2015 but I don't guarantee it'll work. I've not tested it.
MCUDA

The MCUDA translation framework is a linux-based tool designed to effectively compile the CUDA programming model to a CPU architecture.

It might be useful. Here is a link to the website.
CUDA Waste

It is an emulator to use on Windows 7 and 8. I've not tried it though. It doesn't seem to be developed anymore (the last commit is dated on Jul 4, 2013).

Here's the link to the project's website: https://code.google.com/archive/p/cuda-waste/

CU2CL

Last update: 12.03.2017

As dashesy pointed out in the comments, CU2CL seems to be an interesting project. It seems to be able to translate CUDA code to OpenCL code. So if your GPU is capable of running OpenCL code then the CU2CL project might be of your interest.

Links:
- CU2CL homepage
- CU2CL GitHub repository

It is a shame! By not providing an slow-path it is very difficult to build and test applications on just any machine. This means, developers should avoid adding dependency to Cuda. It is still usable by hobbyists and researchers, for one-off projects. Not for real applications for customers. — dashesy, Mar 12 '17 at 04:53
[CU2CL](https://github.com/vtsynergy/CU2CL) seems to be active, and is worth looking at too. — dashesy, Mar 12 '17 at 06:27
Could we somehow see who's tried which method and gotten it working? I'll upvote whichever ones I can get working — Nathan majicvr.com, Apr 05 '18 at 20:11
How about 2019? Is there an update on this?? :D CU2CL last update was on 2017.. — Rafael Sisto, Mar 20 '19 at 12:46
As per an updated google search, there is Coriander around for this purpose... any thoughts from an occasional visitor? https://github.com/hughperkins/coriander — Rafael Sisto, Mar 20 '19 at 13:03
Update with my search in March 2022: (1) Clang compiler (https://llvm.org/docs/CompileCudaWithLLVM.html) Since Clang is mature and gaining popularity I consider this a viable option (anyone experience with clang+CUDA?) (2) Online: Google Colab (https://www.reddit.com/r/CUDA/comments/mlhaes/is_there_a_way_to_practice_and_execute_cuda_code/ and https://www.geeksforgeeks.org/how-to-run-cuda-c-c-on-jupyter-notebook-in-google-colaboratory/) (3) Online: Amazon AWS (costs a few cents per hour) — Bart, Mar 17 '22 at 15:46
I added a new question and answer pair [here](https://stackoverflow.com/a/74250180/1569204) for the state of the art as a per 2020 but its likely to be closed as off topic. — Bruce Adams, Oct 30 '22 at 09:37

elmattic · Answer 3 · 2010-06-21T22:30:58.967

36

You can check also gpuocelot project which is a true emulator in the sense that PTX (bytecode in which CUDA code is converted to) will be emulated.

There's also an LLVM translator, it would be interesting to test if it's more fast than when using -deviceemu.

edited Jun 21 '10 at 22:30

answered Jun 21 '10 at 19:20

elmattic

12,046
5
43
79

The sad part is that is only for linux. Which, while I'm a linux user by default. a small amount of the development I do is on windows machines. The -deviceemu was deprecated, so jskaggz answer doesn't quite fit. over all, this seems to be the best answer. – Narcolapser Jun 24 '10 at 15:21

score 14 · Answer 4 · edited May 25 '13 at 16:00

14

The CUDA toolkit had one built into it until the CUDA 3.0 release cycle. I you use one of these very old versions of CUDA, make sure to use -deviceemu when compiling with nvcc.

edited May 25 '13 at 16:00

talonmies

70,661
34
192
269

answered Jun 21 '10 at 18:33

Jubal

8,357
5
29
30

4

The CUDA emulator is deprecated, you're probably better off looking at gpuocelot. – Tom Jun 21 '10 at 21:12
1

Plus CUDA emulator uses one native OS thread per logical CUDA thread which is terribly inefficient. – elmattic Jun 21 '10 at 22:28

Hugh Perkins · Answer 5 · 2016-11-06T01:51:00.727

11

https://github.com/hughperkins/cuda-on-cl lets you run NVIDIA® CUDA™ programs on OpenCL 1.2 GPUs (full disclosure: I'm the author)

edited Nov 06 '16 at 01:51

answered Oct 20 '16 at 02:49

Hugh Perkins

7,975
7
63
71

Can I run it on a CPU as well? – Mateusz Piotrowski Nov 06 '16 at 00:48
GPU only. Needs OpenCL 1.2 GPU, or better. – Hugh Perkins Nov 06 '16 at 01:50
1

You can probably run it on CPU, using https://jrprice.com/Oclgrind , but I guess that probably isnt what you meant ;-) . I guess Coriander (the new name) probably can run on CPU OpenCL too, but I have never tested this. Might need a bit of prodding. – Hugh Perkins Jun 11 '17 at 08:20

score 3 · Answer 6 · answered Jun 21 '10 at 20:31

Be careful when you're programming using -deviceemu as there are operations that nvcc will accept while in emulation mode but not when actually running on a GPU. This is mostly found with device-host interaction.

And as you mentioned, prepare for some slow execution.

score 1 · Answer 7 · answered Jul 08 '19 at 01:24

1

GPGPU-Sim is a GPU simulator that can run CUDA programs without using GPU. I created a docker image with GPGPU-Sim installed for myself in case that is helpful.

answered Jul 08 '19 at 01:24

sriraj

93
5

GPU Emulator for CUDA programming without the hardware

7 Answers7

For those who are seeking the answer in 2016 (and even 2017) ...

Disclaimer

`nvcc` used to have a `-deviceemu` option back in CUDA Toolkit 3.0

`gpuocelot`

MCUDA

CUDA Waste

CU2CL

Linked

Related

GPU Emulator for CUDA programming without the hardware

7 Answers7

For those who are seeking the answer in 2016 (and even 2017) ...

Disclaimer

nvcc used to have a -deviceemu option back in CUDA Toolkit 3.0

gpuocelot

MCUDA

CUDA Waste

CU2CL

Linked

Related

`nvcc` used to have a `-deviceemu` option back in CUDA Toolkit 3.0

`gpuocelot`