Differences between VexCL, Thrust, and Boost.Compute

Question

With a just a cursory understanding of these libraries, they look to be very similar. I know that VexCL and Boost.Compute use OpenCl as a backend (although the v1.0 release VexCL also supports CUDA as a backend) and Thrust uses CUDA. Aside from the different backends, what's the difference between these.

Specifically, what problem space do they address and why would I want to use one over the other.

Also, on the Thrust FAQ it is stated that

The primary barrier to OpenCL support is the lack of an OpenCL compiler and runtime with support for C++ templates

If this is the case, how is it possible that VexCL and Boost.Compute even exist.

Don't forget C++ AMP! :) – ScarletAmaranth Nov 22 '13 at 20:51 — ScarletAmaranth, Nov 22 '13 at 20:51

score 70 · Accepted Answer · edited May 23 '17 at 12:34

I am the developer of VexCL, but I really like what Kyle Lutz, the author of Boost.Compute, had to say on the same subject on Boost mailing list. In short, from the user standpoint Thrust, Boost.Compute, AMD's Bolt and probably Microsoft's C++ AMP all implement an STL-like API, while VexCL is an expression template based library that is closer to Eigen in nature. I believe the main difference between the STL-like libraries is their portability:

Thrust only supports NVIDIA GPUs, but may also work on CPUs through its OpenMP and TBB backends.
Bolt uses AMD extensions to OpenCL which are only available on AMD GPUs. It also provides Microsoft C++ AMP and Intel TBB backends.
The only compiler that supports Microsoft C++ AMP is Microsoft Visual C++ (although the work on Bringing C++AMP Beyond Windows is being done).
Boost.Compute seems to be the most portable solution of those, as it is based on standard OpenCL.

Again, all of those libraries are trying to implement an STL-like interface, so they have very broad applicability. VexCL was developed with scientific computing in mind. If Boost.Compute was developed a bit earlier, I could probably base VexCL on top of it :). Another library for scientific computing worth looking at is ViennaCL, a free open-source linear algebra library for computations on many-core architectures (GPUs, MIC) and multi-core CPUs. Have a look at [1] for the comparison of VexCL, ViennaCL, CMTL4 and Thrust for that field.

Regarding the quoted inability of Thrust developers to add an OpenCL backend: Thrust, VexCL and Boost.Compute (I am not familiar with the internals of other libraries) all use metaprogramming techniques to do what they do. But since CUDA supports C++ templates, the job of Thrust developers is probably a bit easier: they have to write metaprograms that generate CUDA programs with help of C++ compiler. VexCL and Boost.Compute authors write metaprograms that generate programs that generate OpenCL source code. Have a look at the slides where I tried to explain how VexCL is implemented. So I agree that current Thrust's design prohibits them adding an OpenCL backend.

[1] Denis Demidov, Karsten Ahnert, Karl Rupp, Peter Gottschling, Programming CUDA and OpenCL: A Case Study Using Modern C++ Libraries, SIAM J. Sci. Comput., 35(5), C453–C472. (an arXiv version is also available).

Update: @gnzlbg commented that there is no support for C++ functors and lambdas in OpenCL-based libraries. And indeed, OpenCL is based on C99 and is compiled from sources stored in strings at runtime, so there is no easy way to fully interact with C++ classes. But to be fair, OpenCL-based libraries do support user-based functions and even lambdas to some extent.

Boost.Compute provides its own implementation of simple lambdas (based on Boost.Proto), and allows to interact with user-defined structs through BOOST_COMPUTE_ADAPT_STRUCT and BOOST_COMPUTE_CLOSURE macros.
VexCL provides linear-algebra-like DSL (also based on Boost.Proto), and also supports conversion of generic C++ algorithms and functors (and even Boost.Phoenix lambdas) to OpenCL functions (with restrictions).
I believe AMD's Bolt does support user-defined functors through its C++ for OpenCL extension magic.

Having said that, CUDA-based libraries (and may be C++ AMP) have an obvious advantage of actual compile-time compiler (can you even say that?), so the integration with user code can be much tighter.

you might want to add that while libraries based on OpenCL do not support function objects (and lambdas) as kernels, those not based on OpenCL generally do. — gnzlbg, Mar 17 '14 at 17:36
very nice summary of features in those libraries to emulate function object support! Nice work! The C++ for OpenCL extension magic that you mention about AMD's Bolt is just C++AMP. — gnzlbg, Mar 18 '14 at 08:50
They do support [C++ code in OpenCL sources](https://github.com/HSA-Libraries/Bolt/blob/c2ac21d580f7023d3b3d8d2eef0c899808f0fa81/include/bolt/cl/detail/scan.inl#L88-L129). — ddemidov, Mar 18 '14 at 09:51
That seems to be just a template function but [a bit below](https://github.com/HSA-Libraries/Bolt/blob/c2ac21d580f7023d3b3d8d2eef0c899808f0fa81/include/bolt/cl/detail/scan.inl#L152) they use a template inside an OpenCL string. Is that standard OpenCL or AMD specific? I didn't knew you could do that! — gnzlbg, Mar 18 '14 at 11:05
Yes, I've probably missed a bit. Here is the extension [specification](http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/CPP_kernel_language.pdf). And here is an [announcement](http://developer.amd.com/community/blog/2012/05/21/opencl-1-2-and-c-static-kernel-language-now-available/). — ddemidov, Mar 18 '14 at 11:11
And btw, it is possible to do the same thing with standard OpenCL. [Here](https://github.com/ddemidov/vexcl/blob/master/vexcl/scan.hpp) is a VexCL implementation of scan algorithm adapted from [Bolt version](https://github.com/HSA-Libraries/Bolt/blob/master/include/bolt/cl/scan_kernels.cl). See [this answer](http://stackoverflow.com/questions/22467579/need-to-convert-c-template-to-c99-code/22475483#22475483) for clarification. — ddemidov, Mar 18 '14 at 11:17
It looks that if you can write the string with "template" inside, you can also write different "overloads" for the different types as long as you have a `type_name()` function. It is certainly worth the effort to go this far with OpenCL (otherwise other people wouldn't have done it). However, for me lack of native functors/lambdas/etc is a killer feature and the reason to go with C++AMP/CUDA. Hopefully C++AMP will be merged into clang trunk soon. — gnzlbg, Mar 18 '14 at 12:20
Just released CUDA 7.0 has native support for C++11 features including lambdas; hopefully Thrust will support them too soon. — Jakub Narębski, Mar 23 '15 at 22:07
Is it possible to print out the produced OpenCL code using one of the above libraries? I am not comfortable with OpenCL, but have to port my c++ code to OpenCL, so I thought I could use one of the above libraries and somehow print out the code and use that directly. Any idea? — hamlatzis, Oct 22 '20 at 09:14
With VexCL, If you set the environment variable VEXCL_SHOW_KERNELS=1, it will print all the generated kernels to stdout. — ddemidov, Oct 22 '20 at 13:05

Differences between VexCL, Thrust, and Boost.Compute

1 Answers1

Linked