0

Dockerfile best practices gives the following recommendation for apt-get update usage:

Always combine RUN apt-get update with apt-get install in the same RUN statement.

Should I also use the same rule with pip install -U pip? E.g. which pratice is preferable inside Dockerfile:

# Separate statements
RUN pip install -U pip
RUN pip install opencv-python==4.4.0.46

# Single statement
RUN pip install -U pip && pip install opencv-python==4.4.0.46
NShiny
  • 1,046
  • 1
  • 10
  • 19
  • You are specifying the version of the package to install so I doubt that it would matter. – Joe Apr 19 '21 at 15:19

2 Answers2

1

The takeaway from the Dockerfile best practices is to remove caches in the same layer that they are created. With pip, this means using the --no-cache-dir option or explicitly removing the cache after installs.

The two code snippets will lead to the same image size in the end. But docker also recommends using fewer layers when possible, so I would prefer the first way (with one RUN instruction).

RUN pip install -U --no-cache-dir pip \
    && pip install --no-cache-dir opencv-python==4.4.0.46
RUN pip install -U --no-cache-dir pip
RUN pip install --no-cache-dir opencv-python==4.4.0.46

The recommendation about apt-get update && apt-get install ... doesn't really apply to pip. Apt relies on some files that are downloaded by apt-get update, and good practice is to remove these files after any apt-get install. This reduces Docker image size. Pip always asks the internet about the available packages.

jkr
  • 17,119
  • 2
  • 42
  • 68
1

You should refer to this SO answer. In short, you should always apt-get update before you apt-get install. Keeping these statements on different lines means that Docker will cache them separately. If you re-run the Docker build, Docker may use the cached apt-get update and thus install old packages.

This is not the same with pip - it will always try to install the latest package version available (unless you specified a version explicitly). In the case you have given, you can shorten the statement to RUN pip install -U pip opencv-python==4.4.0.46.

As a matter of convenience, you might want to group your installs in different RUN statemets, so that the Dockerfile is easier to read, and you don't have to reinstall everything in case you want to add/remove a package. So, for example, if you have pip and opencv on a single line and you add pytest on that same line, when you run docker build again, it will install pip and opencv as well. If, on the other hand, you split the lines like so:

RUN pip install -U pip opencv-python
RUN pip install -U pytest

and build, Docker will (by default) use the cached installs of pip and opencv and install only pytest. If you have many packages this is a serious time-saver.

The same thing applies to apt-get, by the way - the only catch is that, as explained, you would probably want to group apt-get install and apt-get update on one line for each group of package installs.

In case you don't use caching (i.e. you run something like docker build . --no-cache), then it wouldn't matter whether you have everything on one line or on separate lines.

cyau
  • 449
  • 4
  • 14