24

I define the following docker image:

FROM python:3.6

RUN pip install --upgrade pip
RUN pip install --upgrade mlflow

ENTRYPOINT mlflow server --host 0.0.0.0 --file-store /mnt/mlruns/

and build an image called mlflow-server. Next, I start this server from a local machine:

docker run --rm -it -p 5000:5000 -v ${PWD}/mlruns/:/mnt/mlruns mlflow-server

Next, I define the following function:

def foo(x, with_af=False):
    mlflow.start_run()
    mlflow.log_param("x", x)
    print(x)
    if with_af:
        with open(str(x), 'wb') as fout:
            fout.write(os.urandom(1024))
        mlflow.log_artifact(str(x))
        mlflow.log_artifact('./foo.data')
    mlflow.end_run()

From the same directory I run foo(10) and the parameter is logged correctly. However, foo(10, True) yields the following error: PermissionError: [Errno 13] Permission denied: '/mnt'. Seems like log_artifact tries to save the file on the local file system directly.

Any idea what am I doing wrong?

Rene B.
  • 6,557
  • 7
  • 46
  • 72
Dror
  • 12,174
  • 21
  • 90
  • 160

3 Answers3

27

Good question. Just to make sure, sounds like you're already configuring MLflow to talk to your tracking server when running your script, e.g. via MLFLOW_TRACKING_URI=http://localhost:5000 python my-script.py.

Artifact Storage in MLflow

Artifacts differ subtly from other run data (metrics, params, tags) in that the client, rather than the server, is responsible for persisting them. The current flow (as of MLflow 0.6.0) is:

  • User code calls mlflow.start_run
  • MLflow client makes an API request to the tracking server to create a run
  • Tracking server determines an appropriate root artifact URI for the run (currently: runs' artifact roots are subdirectories of their parent experiment's artifact root directories)
  • Tracking server persists run metadata (including its artifact root) & returns a Run object to the client
  • User code calls log_artifact
  • Client logs artifacts under the active run's artifact root

The issue

When you launch an MLflow server via mlflow server --host 0.0.0.0 --file-store /mnt/mlruns/, the server logs metrics and parameters under /mnt/mlruns in the docker container, and also returns artifact paths under /mnt/mlruns to the client. The client then attempts to log artifacts under /mnt/mlruns on the local filesystem, which fails with the PermissionError you encountered.

The fix

The best practice for artifact storage with a remote tracking server is to configure the server to use an artifact root accessible to both clients and the server (e.g. an S3 bucket or Azure Blob Storage URI). You can do this via mlflow server --default-artifact-root [artifact-root].

Note that the server uses this artifact root only when assigning artifact roots to newly-created experiments - runs created under existing experiments will use an artifact root directory under the existing experiment's artifact root. See the MLflow Tracking guide for more info on configuring your tracking server.

smurching
  • 456
  • 4
  • 3
  • 1
    For running mlflow server in a container, you can use "docker volume" to mount the host directory with the container's artifact. Then, both of client and server can access the same artifact folder. – suci Aug 30 '20 at 04:25
  • In my case, removing the `meta.yaml` file solved the problem – pdaawr Jun 27 '22 at 13:52
3

I had the same issue, try:

sudo chmod 755 -R /mnt/mlruns
docker run --rm -it -p 5000:5000 -v /mnt/mlruns:/mnt/mlruns mlflow-server

I had to create a folder with the exact path of the docker and change the permissions.

I did the same inside docker.

FROM python:3.6

RUN pip install --upgrade pip
RUN pip install --upgrade mlflow
RUN mkdir /mnt/mlruns/
RUN chmod 777 -R /mnt/mlruns/

ENTRYPOINT mlflow server --host 0.0.0.0 --file-store /mnt/mlruns/
Tiago Cabo
  • 71
  • 7
0

When we send the REST API requests to log MLFlow entities to the tracking server, the server will respond with the store locations based on what's set in the container. If the values are different, then the client will end up assuming that the container-relative paths are available on the host, which will cause permission errors.

Here is a docker-compose file that sets the default store locations at ${HOME}/mnt/mlruns:

services:
  web:
    restart: always
    build:
      context: ./mlflow
      args:
        - "MLFLOW_TRACKING_DIRECTORY=${HOME}/mnt/mlruns"
    image: mlflow_server
    container_name: mlflow_server
    ports:
      - "${MLFLOW_PORT:-5000}:5000"
    volumes:
      - "${HOME}/mnt/mlruns:${HOME}/mnt/mlruns"

Content of ./mlflow:

Dockerfile:

FROM python:3.10-slim-buster

ARG MLFLOW_TRACKING_DIRECTORY
ENV MLFLOW_TRACKING_DIRECTORY=${MLFLOW_TRACKING_DIRECTORY}

# Install python packages
COPY requirements.txt /tmp
RUN pip install -r /tmp/requirements.txt
RUN echo ${MLFLOW_TRACKING_DIRECTORY} > test.txt

CMD mlflow server \
    --backend-store-uri ${MLFLOW_TRACKING_DIRECTORY}/tracking \
    --default-artifact-root ${MLFLOW_TRACKING_DIRECTORY}/artifacts \
    --host 0.0.0.0

requirements.txt:

mlflow==2.3.1

Make sure to set the permissions for ${HOME}/mnt/mlruns$ appropriately as the client will access the local storage directly.

Hajar Razip
  • 449
  • 5
  • 11