Why is my optimization solver running slower in docker?

Question

I am very new to docker and recently wrote a dockerfile to containerize a mathematical optimization solver called SuiteOPT. However, when testing the optimization solver on a few test problems I am experiencing slower performance in docker than outside of docker. For example, one demo problem of a linear program (demoLP.py) takes ~12 seconds to solve on my machine but in docker it takes ~35 seconds. I have spent about a week looking through blogs and stackoverflow posts for solutions but no matter what changes I make the timing in docker is always ~35 seconds. Does anyone have any ideas what might be going on or could anyone point me in the right direction?

Below are links to the docker hub and PYPI page for the optimization solver:

Docker hub for SuiteOPT

PYPI page for SuiteOPT

Edit 1: Adding an additional thought due to a comment from @user3666197. While I did not expect SuiteOPT to perform as well in the docker container I was mainly surprised by the ~3x slowdown for this demo problem. Perhaps the question can be restated as follows: How can determine whether or not this slowdown is caused purely to the fact that I am executing a CPU-RAM-I/O intensive code inside of a docker container instead of due to some other issue with the configuration of my Dockerfile?

Note: The purpose of this containerization is to provide a simple way for users to get started with the optimization software in Python. While the optimization software is available on PYPI there are many non-python dependencies that could cause issues for people wishing to use the software without running into installation issues.

Besides an undoubtedly positive benefits of using containers for reasonably repetitive or mass deployment of pre-configured ready-to-use eco-systems in an almost COTS-fashion to the crowds, **what has lead you to the assumption**, that any such docker-containerisation technology will result in executing a {CPU- |RAM-I/O}-intensive code-inside-an-abstracted-container **without any negative externalities** - be it add-on costs from running the abstraction / containerisation -engine- ( read slower ) plus awfully wasted L1/L2/L3-Cache-Efficient-reuse effects, that will not happen inside container? — user3666197, Feb 19 '20 at 22:42
@user3666197 Thank you for taking the time to respond. Some of your thoughts are touching on what I am curious about. I will say that I did not expect SuiteOPT to perform as well in the docker container but I am just surprised by the ~3x slowdown for this demo problem. I suppose my concern can be restated as follows: **How can determine whether or not this slowdown is caused purely to the fact that I am executing a CPU-RAM-I/O intensive code inside of a docker container instead of due to some other issue with the configuration of my Dockerfile?** I will edit my post to add this thought. — chrundle, Feb 20 '20 at 02:49
You may have to dig more by analyzing your specific case with a tool like `perf`. For example in this article: [Another reason why your Docker containers may be slow](https://hackernoon.com/another-reason-why-your-docker-containers-may-be-slow-d37207dec27f) the performance was bad due to a library used for logging. To visually see what `perf record ...` captures check [Flame Graphs](http://www.brendangregg.com/flamegraphs.html) and [Netflix FlameScope](https://github.com/Netflix/flamescope) — tgogos, Feb 20 '20 at 09:36
Always welcome @chrundle. As Anastasios has posted above, the awfully adverse inefficiency comes from the fact of immense cross-dependency of C-groups **sharing** - the ***biggest sin* in performance hunting in [tag:distributed-systems]** - Let me propose an A / B / C test - run the same on the bare metal [A] + next inside a VM (may use VmWare tool for "packing" the bare-metal as-is into VM + VmWare Player for private use), still on the same bare metal device [B] + the same as a container [ C] --- If performance matters, the VM-isolation v/s Docker-shared C-groups approach data will tell you. — user3666197, Feb 20 '20 at 10:30
@tgogos Thank you for sharing that article and for the flame graphs information. I am currently familiarizing myself with `perf` and the flame graphs you shared. Seems like a promising direction. I will be sure to share any progress I make. Thanks again. — chrundle, Feb 21 '20 at 13:59
I've taken some notes here: [github.com/tgogos/flamescope_test](https://github.com/tgogos/flamescope_test). I'm not sure but they might help :-) — tgogos, Feb 21 '20 at 14:04
@user3666197 Your A / B / C test sounds like a good idea. I will try to get around to setting up and running the B test soon. — chrundle, Feb 21 '20 at 14:05
@tgogos - Nice! Is ther visual distinction of what part of the FlameScope "camel-mountain" is a **S**ystem-**u**nder-**T**est ( our useful workload ) and what are the add-on costs of both the **SuT**-containerisation + the *shared exposition* to concurrent processes in the whole **exo**-system ( the O/S + other containerised worloads ) ? **That is the very measure of efficiency** - how much do we have to pay for having the **SuT** run not on bare metal, but inside a container + what is the level of concurrent load (blocking) our SuT from running as free as it runs on a bare metal(private)host? — user3666197, Feb 21 '20 at 14:19
@user3666197 I've also done some experimentation to see the cost of using a docker bridge / publishing a port versus using `--net=host` which you can find at this question: [Performance issues running nginx in a docker container](https://stackoverflow.com/questions/49023800/performance-issues-running-nginx-in-a-docker-container). I had to run the test twice, do 2 distinct `perf record ...` and compare the results of 2 different flame graphs later. I think the extra time spent when the container has its own network stack is obvious in the pictures. — tgogos, Feb 21 '20 at 14:38

score 1 · Answer 1 · answered Feb 21 '20 at 21:42

Q : How can determine whether or not this slowdown is caused purely to the fact that I am executing a CPU-RAM-I/O intensive code inside of a docker container instead of due to some other issue with the configuration of my Dockerfile?

The battlefield :

_{( Credits: Brendan GREGG )}

Step 0 : collect data about the Host-side run processing :

 mpstat -P ALL 1 ### 1 [s] sampled CPU counters in one terminal-session (may log to file)

 python demoLP.py  # <TheWorkloadUnderTest> expected ~ 12 [s] on bare metal system

Step 1 : collect data about the same processing but inside the Docker-container

plus review policies set in --cpus and --cpu-shares ( potentially --memory +--kernel-memory if used )
plus review effects shown in throttled_time ( ref. Pg.13 )

cat /sys/fs/cgroup/cpu,cpuacct/cpu.stat
nr_periods 0
nr_throttled 0
throttled_time 0 <-------------------------------------------------[*] increasing?

plus review the Docker-container's workload view-from-outside the box by :

cat /proc/<_PID_>/status | grep nonvolu ### in one terminal session
nonvoluntary_ctxt_switches: 6 <------------------------------------[*] increasing?

systemd-cgtop                           ### view <Tasks> <%CPU> <Memory> <In/s> <Out/s>

Step 2 :

Check observed indications against the set absolute CPU cap policy and CPU-shares policy using the flowchart above

Why is my optimization solver running slower in docker?

1 Answers1