2

I have a simple program to measure floating point multiplication (and random generation, compiles g++ -O0). When running on host (Ubuntu 16.04) it gives ~ 1.6s per 10000000 multiplications, when running in container from image 'ubuntu' (without recompilation) it gives ~3.6s. Can someone explain why it slower in ~2.5 times?

p.s. I got multiple runs of program to get rid of outliers. I do not need to optimize it, just detailed explanation of what is happen there.

test.cpp

#include <cstdio>
#include <math.h>
#include <chrono>

using namespace std;
using namespace std::chrono;

// timer cribbed from
// https://gist.github.com/gongzhitaao/7062087
class Timer
{
    public:
    Timer() : beg_(clock_::now()) {}
    void reset() { beg_ = clock_::now(); }
    double elapsed() const
    {
        return duration_cast<second_>(clock_::now() - beg_).count();
    }

    private:
    typedef high_resolution_clock clock_;
    typedef duration<double, ratio<1>> second_;
    time_point<clock_> beg_;
};

#define randf() ((double)rand()) / ((double)(RAND_MAX))

double warmup(Timer tmr) {
    tmr.reset();
    for (int i = 0; i < 100000000; i++)
    {
        double r1 = randf();
        double r2 = randf();
    }
    double elapsed = tmr.elapsed();
    return elapsed;
}

double test(Timer tmr) {
    double total = 0.0;
    tmr.reset();
    for (int i = 0; i < 100000000; i++)
    {
        double r1 = randf();
        double r2 = randf();
        total += r1*r2;
    }
    double elapsed = tmr.elapsed();
    return elapsed;
}

double avg(double* arr) {
    double res = 0.0;
    for (int i = 0; i < 10; i++) {
        res += *(arr + i);
    }
    return res / 10;
}


int main()
{
    double total;
    int total2;
    Timer tmr;
    

    double warmup_runs[10];
    for (int i = 0; i < 10; i++)
    {
        warmup_runs[i] = warmup(tmr);
        printf("warm - %f\n", warmup_runs[i]);
    }
    double avg_warmup = avg(warmup_runs);
    printf("avg warm - %f\n", avg_warmup);

    const int runs = 10;
    double result[runs];
    for (int i = 0; i < runs; i++)
    {
        result[i] = test(tmr);
        printf("real - %f\n", result[i]);
    }
    double avg_result = avg(result);
    printf("avg real - %f\n", avg_result);

    printf("d - %f\n", avg_result - avg_warmup);
}

Dockerfile

FROM ubuntu

WORKDIR /arythmetics

COPY a.out .

compile g++ -O0 test.cpp

to run inside container I use after build:

docker run -it <container> .bin/bash

.\a.out

UPDATE: after compiling with -static flag, program run time is same in both environments there is another question, why it is practically same? is'n there should be some containerization overhead?

Егор Лебедев
  • 1,161
  • 1
  • 10
  • 26
  • Please include a minimal, reproducible example. – jkr Sep 05 '20 at 13:57
  • 3
    You're calling the `rand` function from libc, which may be implemented differently in your Docker container. For reliable results, use the exact same OS and package versions on the host and in the container, or link libc statically using something like `g++ -O0 -static-libstdc++ -static-libgcc test.cpp`. – Thomas Sep 05 '20 at 14:14
  • @Thomas if substract time of random functions without arithm operations from test function there will be difference- on host: ~0,03 in docker: ~1,3, so arithmetics is still slower a lot than on host. but I got you. need to test your advice – Егор Лебедев Sep 05 '20 at 15:40
  • 1
    Can you run the same compiled binary in docker built `FROM scratch` ? Although you will need to produce static compiled binary for that. – Alex Yu Sep 05 '20 at 16:13
  • 1
    @AlexYu tests just with `-static` and now run time is same in docker as in host, thanks – Егор Лебедев Sep 05 '20 at 17:10
  • then there is another question, why it is practically same? is'n there should be some containerization overhead? – Егор Лебедев Sep 05 '20 at 17:13
  • 1
    I think https://stackoverflow.com/questions/21889053/what-is-the-runtime-performance-cost-of-a-docker-container#26149994 answers that question well. – Thomas Sep 05 '20 at 17:23
  • @ЕгорЛебедев I expected this. There is no "containerization overhead" could be for CPU load in principle - it's the same OS process as others with just some labels attached. It's IO must be differrent but from code I don't see any disk/network ops. – Alex Yu Sep 05 '20 at 18:42
  • @AlexYu Yeah thanks a lot, I just want to understand why there is no overhead, like in low-level details – Егор Лебедев Sep 05 '20 at 18:51
  • 1
    I would recommend this: https://www.katacoda.com/openshift/courses/subsystems/container-internals-lab-2-0-part-1. Don't be confused that it uses `podman` instead of `docker` - it's the same. It explains what containers isolate and what not. 4th page directly states: "Containers are just regular Linux processes that were started as child processes of a container runtime instead of by a user running commands in a shell" – Alex Yu Sep 05 '20 at 19:19
  • @AlexYu last question: am I right that running `./a.out` inside container and from host shell will create equal child proccesses with that program (beside user env and other stuff)? So instructions like arithmetics (that do not depend on environment stuff) will be practically equeal like running two copies of that programm from host shell? – Егор Лебедев Sep 06 '20 at 11:21
  • 1
    They are not practically but absolutely equal. – Alex Yu Sep 06 '20 at 12:02

1 Answers1

2

You're calling the rand function from libc, which may be implemented differently in your Docker container.

For reliable results, either use the exact same OS and package versions on the host and in the container, or link libc statically using something like:

g++ -O0 -static-libstdc++ -static-libgcc test.cpp
Thomas
  • 174,939
  • 50
  • 355
  • 478