17

I'd like to know if it's possible to change at (Python) runtime the maximum number of threads used by OpenBLAS behind numpy?

I know it's possible to set it before running the interpreter through the environment variable OMP_NUM_THREADS, but I'd like to change it at runtime.

Typically, when using MKL instead of OpenBLAS, it is possible:

import mkl
mkl.set_num_threads(n)
ali_m
  • 71,714
  • 23
  • 223
  • 298
Théo T
  • 3,270
  • 5
  • 20
  • 22
  • 2
    You can try calling the `openblas_set_num_threads` function using the `ctypes` module. Similar to [this question.](http://stackoverflow.com/q/28283112/2379410) –  Apr 10 '15 at 18:21

2 Answers2

16

You can do this by calling the openblas_set_num_threads function using ctypes. I often find myself wanting to do this, so I wrote a little context manager:

import contextlib
import ctypes
from ctypes.util import find_library

# Prioritize hand-compiled OpenBLAS library over version in /usr/lib/
# from Ubuntu repos
try_paths = ['/opt/OpenBLAS/lib/libopenblas.so',
             '/lib/libopenblas.so',
             '/usr/lib/libopenblas.so.0',
             find_library('openblas')]
openblas_lib = None
for libpath in try_paths:
    try:
        openblas_lib = ctypes.cdll.LoadLibrary(libpath)
        break
    except OSError:
        continue
if openblas_lib is None:
    raise EnvironmentError('Could not locate an OpenBLAS shared library', 2)


def set_num_threads(n):
    """Set the current number of threads used by the OpenBLAS server."""
    openblas_lib.openblas_set_num_threads(int(n))


# At the time of writing these symbols were very new:
# https://github.com/xianyi/OpenBLAS/commit/65a847c
try:
    openblas_lib.openblas_get_num_threads()
    def get_num_threads():
        """Get the current number of threads used by the OpenBLAS server."""
        return openblas_lib.openblas_get_num_threads()
except AttributeError:
    def get_num_threads():
        """Dummy function (symbol not present in %s), returns -1."""
        return -1
    pass

try:
    openblas_lib.openblas_get_num_procs()
    def get_num_procs():
        """Get the total number of physical processors"""
        return openblas_lib.openblas_get_num_procs()
except AttributeError:
    def get_num_procs():
        """Dummy function (symbol not present), returns -1."""
        return -1
    pass


@contextlib.contextmanager
def num_threads(n):
    """Temporarily changes the number of OpenBLAS threads.

    Example usage:

        print("Before: {}".format(get_num_threads()))
        with num_threads(n):
            print("In thread context: {}".format(get_num_threads()))
        print("After: {}".format(get_num_threads()))
    """
    old_n = get_num_threads()
    set_num_threads(n)
    try:
        yield
    finally:
        set_num_threads(old_n)

You can use it like this:

with num_threads(8):
    np.dot(x, y)

As mentioned in the comments, openblas_get_num_threads and openblas_get_num_procs were very new features at the time of writing, and might therefore not be available unless you compiled OpenBLAS from the latest version of the source code.

ali_m
  • 71,714
  • 23
  • 223
  • 298
  • 2
    note that as of v0.2.14 pthread openblas_get_num_procs does not account for affinity so it can lead to oversubscription when the number of usable cpus is restricted (e.g. in containers), use len(os.sched_getaffinity(0)) (python >= 3.3) instead – jtaylor May 28 '15 at 07:46
  • @jtaylor Great idea, I'm thinking, if it is possible to change the thread binding at run time. For example, I want some to be done with 8 threads at 1st CPU socket, and others to be done with single thread at 2nd CPU socket – Y00 Jan 16 '21 at 17:04
13

We recently developed threadpoolctl, a cross platform package to do control the number of threads used in calls to C-level thread-pools in python. It works similarly to the answer by @ali_m but detects automatically the libraries that needs to be limited by looping through all loaded libraries. It also comes with introspection APIs.

This package can be installed using pip install threadpoolctl and come with a context manager that allow you to control the number of threads used by packages such as numpy:

from threadpoolctl import threadpool_limits
import numpy as np


with threadpool_limits(limits=1, user_api='blas'):
    # In this block, calls to blas implementation (like openblas or MKL)
    # will be limited to use only one thread. They can thus be used jointly
    # with thread-parallelism.
    a = np.random.randn(1000, 1000)
    a_squared = a @ a

you can also have finer control on different threadpools (such as differenciating blas from openmp calls).

Note: this package is still in development and any feedback is welcomed.

Thomas Moreau
  • 4,377
  • 1
  • 20
  • 32