1

For the purposes of this question, please assume this Dockerfile requires a multistage build where step one utilizes rocker/r-ver:4.1.1 as a base image and step two utilizes mcr.microsoft.com/mssql/server:2017-latest as a base image. Below is the Dockerfile:

# Stage 1: Build stage
FROM rocker/r-ver:4.1.1 AS build

RUN /rocker_scripts/install_tidyverse.sh

# Declare environment variables
ENV PATH=/root/.local/bin:$PATH \
    ACCEPT_EULA=y \
    DEBIAN_FRONTEND=noninteractive \
    TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/

# Install system libraries
RUN apt-get update \
  && apt-get install -y --allow-downgrades binutils \
  && apt-get install -y gnupg apt-transport-https curl apt-utils libgsl-dev libfontconfig1-dev libmariadb-dev tdsodbc  \
    libmagick++-dev libavfilter-dev cmake cargo libpoppler-cpp-dev libtesseract-dev libleptonica-dev tesseract-ocr-eng

# Install R packages
RUN Rscript -e "install.packages(c('RMySQL', 'ragg', 'RMariaDB', 'nloptr', 'tidyverse', 'lme4', 'phyr', 'odbc',  \
    'MEMSS', 'mlmRev', 'gamm4', 'pbkrtest', 'semEff', 'merDeriv', 'car', 'rr2', 'magick', 'av'), dependencies = TRUE)"

# Copy the files
COPY . /app

# Stage 2: Final stage
FROM mcr.microsoft.com/mssql/server:2017-latest

# Set environment variables
ENV ACCEPT_EULA=Y PATH=/usr/local/bin/R:$PATH DEBIAN_FRONTEND=noninteractive

# Install system libraries
# Need to do this a second time since the libraries don't get copied over properly as the SQL image erases some
# Still it's necessary in the first step as well since the R packages can't be installed otherwise
RUN apt-get update \
  && apt-get install -y --allow-downgrades binutils \
  && apt-get install -y gnupg apt-transport-https curl apt-utils libgsl-dev libfontconfig1-dev libmariadb-dev tdsodbc  \
    libmagick++-dev libavfilter-dev cmake cargo libpoppler-cpp-dev libtesseract-dev libleptonica-dev tesseract-ocr-eng \
    r-base r-base-dev

# Copy the files
COPY --from=build /app /app

# Set working directory
WORKDIR /app

# Set the entrypoint command
CMD ["Rscript", "run.R"]

As you can see, I am installing the system libraries in both the first and second stage. I am wondering if it is possible to avoid this redundancy by copying the installation over somehow. The COPY --from=build /app /app does not accomplish this, and I have tried adding the below commands instead of installing the system libraries again to no avail:

# Copy the R binary from build stage
COPY --from=build /usr/local/bin/R /usr/local/bin/R

# Copy the R libraries from build stage
COPY --from=build /usr/local/lib/R /usr/local/lib/R

Does anyone have any idea how I can avoid running the install system libraries command in both stages?

work89
  • 75
  • 8
  • What's your use case for running an R application, but based on a database server image, but where the database isn't running? Can you delete the second stage entirely? – David Maze May 26 '23 at 18:40
  • @DavidMaze I am going to run an R script that connects to a SQL Server instance using ODBC Driver 17 for SQL Server. The problem is I need the specific version of R (4.1.1). I have tried deleting the second stage and installing the msodbcsql17 packages and dependencies within the first section, but there seems to be a bug with how the driver is installed in ubuntu, based on these links: https://github.com/microsoft/msphpsql/issues/252 and https://stackoverflow.com/questions/74708033/error-code-0x2746-10054-when-trying-to-connect-to-sql-server-2014-via-odbc-fro – work89 May 26 '23 at 18:46
  • The last `FROM` line says "this container is a database", not "this container connects to a database". You need an ordinary ODBC client library, but you do not need to build it into a database image in order to use it. – David Maze May 26 '23 at 22:58

1 Answers1

0

In case anyone has a similar issue, my dockerfile after implementing a change is below:

FROM rocker/tidyverse:4.1.1

# Declare environment variables
ENV PATH=/root/.local/bin:$PATH \
    ACCEPT_EULA=y \
    DEBIAN_FRONTEND=noninteractive \
    TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/

# Install system libraries
RUN apt-get update \
  && apt-get install -y --allow-downgrades binutils libmagick++-dev libavfilter-dev cmake cargo libpoppler-cpp-dev \
    libtesseract-dev libleptonica-dev tesseract-ocr-eng gnupg2 curl gnupg apt-transport-https tdsodbc

# Install R packages
RUN Rscript -e "install.packages(c('RMySQL', 'ragg', 'RMariaDB', 'nloptr', 'tidyverse', 'lme4', 'phyr', 'odbc',  \
    'MEMSS', 'mlmRev', 'gamm4', 'pbkrtest', 'semEff', 'merDeriv', 'car', 'rr2', 'magick', 'av', 'foreach',  \
    'doParallel', 'glue', 'data.table', 'dplyr', 'DBI'), dependencies = TRUE)"

# Set environment variables for ODBC configuration
ENV ODBCSYSINI /etc

# Copy your odbcinst.ini configuration file to the container
COPY odbcinst.ini /etc/odbcinst.ini

COPY . .

CMD ["Rscript", "Run.R"]

And below is my odbcinst.ini file:

[ODBC Driver 17 for SQL Server]
Description=Microsoft ODBC Driver 17 for SQL Server
Driver=/usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
UsageCount=1

I added an installation for tdsodbc in my final dockerfile, removed Microsoft's odbc driver, and updated the ini file to reflect the change in driver. I also skipped the multistage build this way.

work89
  • 75
  • 8