1

On my class on university I need to create program in C++ which find all divisors of big number. I need to do it in several ways. One of them is to use OpenMP. So far i have this:

void printDivisors(unsigned long long n) {
    stack<unsigned long long> numbers;
    #pragma omp parallel for shared(numbers)
    for (unsigned long long i = 1; i <= n; i++) {
        if (n % i == 0) {
            numbers.push(i);
        }
    }

    while( !numbers.empty() ){
        printf("%lld ", numbers.top());
        numbers.pop();
    }
}

It take so long than use synchronized version (I know that its better algorithm but still too long):

void printDivisors(long long n)
{
    for (long long i = 1; i*i < n; i++) {
        if (n % i == 0)
            printf("%lld ", i);
    }
    for (long long i = sqrt(n); i >= 1; i--) {
        if (n % i == 0)
            printf("%lld ", n / i);
    }
}

I've tried to add schedule(static, n) (after shared) but program simply ended after reach to openmp section...

I'm using CLion with CMake and run it in Docker container:

cmake_minimum_required(VERSION 3.16.3)
project(projekt_dzielniki)

set(CMAKE_CXX_STANDARD 11)

OPTION (USE_OpenMP "Use OpenMP" ON)
add_executable(projekt_dzielniki_openmp openmp.cpp openmp.h)

FIND_PACKAGE( OpenMP REQUIRED)
if(OPENMP_FOUND)
    message("OPENMP FOUND")
    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
    set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")
endif()

Dockerfile:

FROM ubuntu:20.04

RUN DEBIAN_FRONTEND="noninteractive" apt-get update && apt-get -y install tzdata

RUN apt-get update \
  && apt-get install -y ssh \
      build-essential \
      gcc \
      g++
      gdb \
      clang \
      cmake \
      rsync \
      tar \
      python \
  && apt-get clean

RUN ( \
    echo 'LogLevel DEBUG2'; \
    echo 'PermitRootLogin yes'; \
    echo 'PasswordAuthentication yes'; \
    echo 'Subsystem sftp /usr/lib/openssh/sftp-server'; \
  ) > /etc/ssh/sshd_config_test_clion \
  && mkdir /run/sshd

RUN useradd -m user \
  && yes password | passwd user

RUN usermod -s /bin/bash user

CMD ["/usr/sbin/sshd", "-D", "-e", "-f", "/etc/ssh/sshd_config_test_clion"]

EDIT: Example number: 922337203685477580 tooked for sync version 22.69 but OpenMP more than 15 min...

Wiszen
  • 77
  • 1
  • 4
  • 12

1 Answers1

3

Since it is your university work I give you hints not solution (code). The problem is with the OpenMP code is that numbers are shared, and write operations to containers and container adapters from more than one thread are not required by the C++ standard to be thread safe. So you have to add an openmp directive to protect it. Alternatively, you can create a used defined reduction in openmp.

Why are you surprised that the 1st algorithm is so slow? That is the expected behaviour (it is not related to OpenMP, just to the algorithm). ps: I have changed the 1st algorithm and measured runtimes: A modified 1st algorithm using OpenMP (4 cores+hyperthreading) =1.4s, your second algorithm=15.5 s

EDIT: More hints:

Laci
  • 2,738
  • 1
  • 13
  • 22
  • Can you give me more hints about directive for container? Or maybe you can share your code? I know what you've point in a first sentence but I'm not C++ developer and I wont be (working with Apex Oracle) – Wiszen Jul 07 '21 at 23:06
  • @Wiszen The example on page 171 ("6.1 The `critical` construct") of [this document](https://openmp.org/wp-content/uploads/openmp-examples-4.5.0.pdf) should help. – paleonix Jul 09 '21 at 09:43
  • @PaulG. So i've added `#pragma omp critical` before `numbers.push(i)` but still it is slow... – Wiszen Jul 13 '21 at 10:52
  • Well the critical sections are effectively sequentialized. I never said that it would be fast in your example, but at least it should give the right result with out any race conditions. You could try a user defined reduction like [here](https://stackoverflow.com/a/67920884/10107454) to avoid the sequentialization although I don't know if is actually the problem here. – paleonix Jul 13 '21 at 11:11
  • In the first example the algorithm is slow, revise this : `for (unsigned long long i = 1; i <= n; i++) {` Do you really neen to iterate to `n`? ps: I also added `#pragma omp critical` in my version and openMP made it much faster than the serial one. – Laci Jul 13 '21 at 12:31
  • Finally I've made it! Thank you @PaulG. and @Laci for many tips. I've cut in `for` loop `n` to `sqrtl(n)` (so easy...) and now it is much faster. – Wiszen Jul 13 '21 at 21:27