5

Every invocation of R is creating 63 sub processes

Rscript --vanilla  -e 'Sys.sleep(5)' &  pstree -p $! | grep -c '{R}'
# 63

where pstree looks something like this

R(2562809)─┬─{R}(2562818)                                                                                                                                                     
           ├─{R}(2562819)
           ...
           ├─{R}(2562878)
           ├─{R}(2562879)
           └─{R}(2562880)

is this expected behavior?

This is a 72 core machine with debian 9.3, R==3.4.3, blas==3.7.0, and openmp==2.0.2

dpkg-query -l '*blas*' 'r-base' '*lapack*' '*openmp*'|grep ^ii
ii  libblas-common     3.7.0-2                    amd64        Dependency package for all BLAS implementations
ii  libblas-dev        3.7.0-2                    amd64        Basic Linear Algebra Subroutines 3, static library
ii  libblas3           3.7.0-2                    amd64        Basic Linear Algebra Reference implementations, shared library
ii  liblapack-dev      3.7.0-2                    amd64        Library of linear algebra routines 3 - static version
    ii  liblapack3         3.7.0-2                    amd64        Library of linear algebra routines 3 - shared version
ii  libopenblas-base   0.2.19-3                   amd64        Optimized BLAS (linear algebra) library (shared library)
ii  libopenmpi-dev     2.0.2-2                    amd64        high performance message passing library -- header files
ii  libopenmpi2:amd64  2.0.2-2                    amd64        high performance message passing library -- shared library
ii  libopenmpt0:amd64  0.2.7386~beta20.3-3+deb9u2 amd64        module music library based on OpenMPT -- shared library
ii  openmpi-bin        2.0.2-2                    amd64        high performance message passing library -- binaries
ii  openmpi-common     2.0.2-2                    all          high performance message passing library -- common files
ii  r-base             3.4.3-1~stretchcran.0      all          GNU R statistical computation and graphics system

R is using openblas and openmp libraries

Rscript --vanilla -e 'Sys.sleep(1)' &  lsof -p $!  | grep -E -i 'blas|lapack|parallel|omp'
[1] 2574896
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
    Output information may be incomplete.
R       2574896 foranw  mem    REG   0,20          13931603 /usr/lib/libopenblasp-r0.2.19.so (path dev=0,21)
R       2574896 foranw  mem    REG   0,20          13931604 /usr/lib/openblas-base/libblas.so.3 (path dev=0,21)
R       2574896 foranw  mem    REG   0,20          13840156 /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0 (path dev=0,21)
Bracula
  • 335
  • 1
  • 3
  • 14
Will
  • 1,206
  • 9
  • 22

2 Answers2

5

R is (famously) single-core.

I suspects this comes from libopenblas-base which is (also known to be) multi-core.

Contrast this with our rocker container which uses libblas3 -- single-threaded, not optmized:

> system("pstree")
bash───R───sh───pstree
> system("ps -ax")
  PID TTY      STAT   TIME COMMAND
    1 pts/0    Ss     0:00 /bin/bash
  579 pts/0    S+     0:00 /usr/lib/R/bin/exec/R
  583 pts/0    S+     0:00 sh -c ps -ax
  584 pts/0    R+     0:00 ps -ax
> 

As Debian maintainer for R, I take advantage of the fact that we have several BLAS / LAPACK builds. Base can be ok, OpenBLAS often is faster (but be careful when you then launch multiple cores from R via the different mechanisms) and there is also Atlas. What is "best" will always get a fimr "it depends".

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Thanks for maintaining deb's R (and answering the question)! I recently ran `apt dist-upgrade` and then noticed `mclapply`ing over blas enabled models forking into way too many jobs. Is it likely the upgrade changed the blas install? is there a way to specify which library R should use -- `julia` and some python libs depend on `libopenblas-base`? PS. rocker.org is a dead link? – Will Jan 23 '18 at 20:47
  • Sorry, was in a hurry: https://www.rocker-project.org/ (now edited in answer too). And BLAS etc have distro-wide defaults. Ie if you install libatlas* and remove libopenblas* everything will work as before -- but not fork. I sometimes do that for the same reason, but wish there was a middle ground. Maybe bring the question to the `r-sig-debian` list. – Dirk Eddelbuettel Jan 23 '18 at 20:49
  • I am tied up with something else but check the manual page / some wikis on `dpkg-alternative`. There is a given default ranking of the available BLAS, you can override if you so choose and it will remember. – Dirk Eddelbuettel Jan 23 '18 at 20:55
  • 1
    I generally set my own number of threads, when BLAS is activated by default. I use this `library(RhpcBLASctl)` and `blas_set_num_threads(n)`, where `n` is the number of threads you want. – Mohamad Elmasri Feb 03 '18 at 09:59
3

Setting BLAS/OpemMP env variables (30791550) can control allocation. I'm still not sure if the observed 'use most of the cores' default is intentional/reasonable

export OPENBLAS_NUM_THREADS=4 OMP_NUM_THREADS=4 MKL_NUM_THREADS=4
Rscript --vanilla -e 'Sys.sleep(1)' &  pstree -p $! |wc -l
# 3
Will
  • 1,206
  • 9
  • 22
  • 3
    There are two packages on CRAN that allow you do this from within as well: [RhpcBLASctl](https://cran.r-project.org/package=RhpcBLASctl) and IIRC another one whose name I can never remember at first... – Dirk Eddelbuettel Jan 23 '18 at 20:45