8

I wanted to parallelize a for loop and found out about std::for_each as well as its execution policies. Surprisingly it didn't parallelize when using GCC:

#include <iostream>
#include <algorithm>
#include <execution>
#include <chrono>
#include <thread>
#include <random>

int main() {
    std::vector<int> foo;
    foo.reserve(1000);
    for (int i = 0; i < 1000; i++) {
        foo.push_back(i);
    }

    std::for_each(std::execution::par_unseq,
                  foo.begin(), foo.end(),
                  [](auto &&item) {
                      std::cout << item << std::endl;
                      std::random_device dev;
                      std::mt19937 rng(dev());
                      std::uniform_int_distribution<std::mt19937::result_type> dist6(10, 100);
                      std::this_thread::sleep_for(std::chrono::milliseconds(dist6(rng)));
                      std::cout << "Thread ID: " << std::this_thread::get_id() << std::endl;
                  });
}

This code still runs sequentially.

Using MSVC the code is parallelized and finishes much quicker.

GCC:

$ gcc --version
gcc (Ubuntu 10.1.0-2ubuntu1~18.04) 10.1.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

MSVC:

>cl.exe
Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29112 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

usage: cl [ option... ] filename... [ /link linkoption... ]

CMakeLists.txt:

cmake_minimum_required(VERSION 3.17)
project(ParallelTesting)

set(CMAKE_CXX_STANDARD 20)

add_executable(ParallelTesting main.cpp)

Is there anything specific I need to do to enable parallelization with GCC as well?

ldd output of my binary:

$ ldd my_binary
    linux-vdso.so.1 (0x00007ffe9e6b9000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f79efaa0000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f79ef881000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f79ef4ad000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f79ef295000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f79eeea4000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f79f041a000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f79eeb06000)

The debug and release version of the binary overall have the same ldd output.

BullyWiiPlaza
  • 17,329
  • 10
  • 113
  • 185
  • 3
    Have you tried changing GCC's optimization level? My reading of the cppreference docs indicates that parallelization is permitted, not required, when `par_unseq` is specified. – Dr. Watson Dec 29 '20 at 17:37
  • 3
    Libstdc++ (likely used by your GCC) doesn't have its own version of parallelized algorithms. Instead, it uses Intel TBB as a backend. Is TBB linked to your program? Libstdc++ may be also configured to use a serial backend, for more details, look here: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/pstl/parallel_backend.h. – Daniel Langr Dec 29 '20 at 17:37
  • @Dr.Watson: Compiling with full optimizations does not help either. Daniel, I'm using a "regular" `GCC 10` version so I'm not sure how it is configured specifically and whether `TBB` is linked to it. I added the `ldd` output of my binary to the question for further investigation. – BullyWiiPlaza Dec 30 '20 at 21:08
  • 1
    @BullyWiiPlaza, I think Daniel's got the right idea. The cppreference library feature support [page](https://en.cppreference.com/w/cpp/compiler_support) suggests that GCC only supports the Parallelism TS when compiled with -ltbb. So you'll want the `tbb` library somewhere in your project, and then you'll want to `add_library` and `target_link_libraries`, as with any other lib. – Dr. Watson Dec 31 '20 at 06:25
  • 1
    `sudo apt-get install libtbb-dev` and then link with `-ltbb`, otherwise you'll get undefined symbols. – metalfox Dec 31 '20 at 11:41

2 Answers2

7

I solved it by firstly upgrading my WSL Ubuntu distribution from version 18.04 to 20.04 since after running sudo apt install gcc libtbb-dev to install TBB I still got the following error: #error Intel(R) Threading Building Blocks 2018 is required; older versions are not supported. This is caused by TBB being too old.

Now with TBB 2002.1-2 installed it's working as expected:

$ sudo apt install libtbb-dev
[sudo] password for ubuntu:
Reading package lists... Done
Building dependency tree
Reading state information... Done
libtbb-dev is already the newest version (2020.1-2).
0 upgraded, 0 newly installed, 0 to remove and 10 not upgraded.

This answer describes all the details very well.

Since I'm using CMake I also had to add the following line to my CMakeLists.txt:

# Link against the dependency of Intel TBB (for parallel C++17 algorithms)
target_link_libraries(${PROJECT_NAME} tbb)
BullyWiiPlaza
  • 17,329
  • 10
  • 113
  • 185
0

I had the same problem, and the answer by @BullyWiiPlaza helped me use the required library and also verify the compiler's operation.

One additional issue I faced was that the library considered the work I provided to for_each(execution::par_unseq, … too small for parallelizing. My assumption was that the library would arrange for the function to be called multiple times by each thread along different parts of the iterator sequence.

I solved the problem by creating larger chunks on my own.

typedef pair<micro_work_type::iterator, micro_work_type::iterator> work_type;

void
worker(work_type &be)
{
    for (auto v = be.first; v != be.second; v++)
        // Work on *v
}

[…]
        vector <work_type> chunks;
        auto pos = micro_work.begin();
        auto begin = pos;
        size_t i;
        for (i = 0; i < micro_work.size(); i++, pos++) {
            if (i > 0 && i % BATCH_SIZE == 0) {
                chunks.push_back(pair{begin, pos});
                begin = pos;
            }
        }
        if (i > 0 && i % BATCH_SIZE != 0)
            chunks.push_back(pair{begin, pos});

        for_each(execution::par_unseq, chunks.begin(), chunks.end(), worker);

Diomidis Spinellis
  • 18,734
  • 5
  • 61
  • 83