Setting-up MLflow on Google Colab

Question

I frequently use Google Colab to train TF/PyTorch models as Colab provides me with GPU/TPU runtime. Besides, I like working with MLflow to store and compare trained models, tracking progress, sharing, etc. What are the available solutions to use MLflow with Google Colab?

desertnaut · Answer 1 · 2023-05-21T12:27:33.390

There was a Github issue on this, and contributor dmatrix was kind enough to provide a notebook with a full solution, utilizing pyngrok.

Here is the code (meant to be run on a Colab notebook), reposted here with the implicit permission of the author:

!pip install mlflow --quiet
!pip install pyngrok --quiet

import mlflow

with mlflow.start_run(run_name="MLflow on Colab"):
  mlflow.log_metric("m1", 2.0)
  mlflow.log_param("p1", "mlflow-colab")

# run tracking UI in the background
get_ipython().system_raw("mlflow ui --port 5000 &") # run tracking UI in the background


# create remote tunnel using ngrok.com to allow local port access
# borrowed from https://colab.research.google.com/github/alfozan/MLflow-GBRT-demo/blob/master/MLflow-GBRT-demo.ipynb#scrollTo=4h3bKHMYUIG6

from pyngrok import ngrok

# Terminate open tunnels if exist
ngrok.kill()

# Setting the authtoken (optional)
# Get your authtoken from https://dashboard.ngrok.com/auth
NGROK_AUTH_TOKEN = ""
ngrok.set_auth_token(NGROK_AUTH_TOKEN)

# Open an HTTPs tunnel on port 5000 for http://localhost:5000
ngrok_tunnel = ngrok.connect(addr="5000", proto="http", bind_tls=True)
print("MLflow Tracking UI:", ngrok_tunnel.public_url)

The output of which will be a pyngrok-generated URL like:

MLflow Tracking UI: https://0a23d7a7d0c4.ngrok.io

clicking on which will lead to an MLfLow GUI screen.

(Slight modification of the original code thanks to pyngrok creator, Alex Laird)

Tested with MLflow versions 1.10.0 and 1.11.0.

Dean Dean · Answer 2 · 2021-05-05T17:39:17.500

There are some good answers given, which have some downsides – mainly that you need to set up your own MLflow server for it to work.

TL;DR:

I would summarize by saying you have 2 options:

Option 1: Do everything yourself

For this option, I'm taking some of the code from desertnaut's answer (credit to dmatrix). Basically, if we use ngrok, we can divide the process into 3 steps:

Setting up an MLflow server: Either locally, on Colab, or somewhere else.

pip install mlflow --quiet
mlflow ui --port 5000

or when running in a notebook:

!pip install mlflow --quiet
get_ipython().system_raw("mlflow ui --port 5000 &")

This will initialize an MLflow server. The downside of doing this in Colab, is that your runtime there is ephemeral, which means that when you close your session, all the experiment information will be lost. You can run the command locally, but then tunneling in with ngrok might be more complex.

Make it accessible for Colab and optionally add authentication This can be done with ngrok. The code is as following:

!pip install pyngrok --quiet

from pyngrok import ngrok
from getpass import getpass

# Terminate open tunnels if exist
ngrok.kill()

# Setting the authtoken (optional)
# Get your authtoken from https://dashboard.ngrok.com/auth
NGROK_AUTH_TOKEN = getpass('Enter the ngrok authtoken: ')
ngrok.set_auth_token(NGROK_AUTH_TOKEN)

# Open an HTTPs tunnel on port 5000 for http://localhost:5000
ngrok_tunnel = ngrok.connect(addr="5000", proto="http", bind_tls=True)
print("MLflow Tracking UI:", ngrok_tunnel.public_url)

Here I modified the code to use getpass since plaintext access tokens are not recommended.

Log the experiment details Finally, I'm assuming you already have the code to log with MLflow, but the example above is a simple showcase of how to create an experiment:

import mlflow

with mlflow.start_run(run_name="MLflow on Colab"):
  mlflow.log_metric("m1", 2.0)
  mlflow.log_param("p1", "mlflow-colab")

This has also been tested with the current version of MLflow – 1.15.0

Option 2: Use a hosted server

This option saves the setup of ngrok and the tunneling. It also provides the benefits of team access controls and an improved UI. There are 2 main options for this that I'm aware of: Databricks, and DAGsHub.

Databricks is the hosted enterprise solution for this, while DAGsHub is the free community option.

In the case where you use DAGsHub, you skip step 1, and step 2 becomes much simpler. The snippet above becomes the following (after creating an account and a project on the relevant platform):

!pip install mlflow --quiet

import mlflow
import os
from getpass import getpass

os.environ['MLFLOW_TRACKING_USERNAME'] = input('Enter your DAGsHub username: ')
os.environ['MLFLOW_TRACKING_PASSWORD'] = getpass('Enter your DAGsHub access token: ')
os.environ['MLFLOW_TRACKING_PROJECTNAME'] = input('Enter your DAGsHub project name: ')

mlflow.set_tracking_uri(f'https://dagshub.com/' + os.environ['MLFLOW_TRACKING_USERNAME'] + '/' + os.environ['MLFLOW_TRACKING_PROJECTNAME'] + '.mlflow')

with mlflow.start_run(run_name="MLflow on Colab"):
  mlflow.log_metric("m1", 2.0)
  mlflow.log_param("p1", "mlflow-colab")

As you can see, this is significantly shorter. It also has the benefit of being persistent.

score 5 · Answer 3 · edited Jun 25 '21 at 19:05

You could use the free MLflow tracking server provided by databricks-community and use it with Google Colab. The following gif pretty much elaborates how to setup a MLflow tracking server on databricks:

Steps for setting up MLflow tracking server on databricks

Now, with respect to accessing the above setup MLflow from Google Colab. Just follow the code snippet(s) below:

Snippet #1

!pip install mlflow
!databricks configure --host https://community.cloud.databricks.com/

After you run the snippet above it prompts you to enter your databricks account’s username and password that you’ve just created. Please do that.

Snippet #2

import mlflow
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("<Enter your copied experiment name here>")

If you have followed the gif file attached above, I would have copied experiment_name at the end of it. Please do the same and pass your experiment_name to the set_experiment() function.

By following the steps above, you can be sure that MLflow is configured on Google Colab!

Btw, I've written a medium story on the same, please do check that out: Intro to MLflow — With Colab — Part 1/2

Jules · Answer 4 · 2020-06-12T17:19:16.520

-1

Try this. But big question is how to get to the UI

https://github.com/dmatrix/mlflow-tests/blob/master/mllfow_test.ipynb

One option is to download the /drive/mlruns directory onto your localhost, and launch mlflow ui on the local host.

Not sure if the DISPLAY localhost:0:0 is going to work for the remote server to display locally to your localhost.

edited Jun 12 '20 at 17:19

answered Jun 12 '20 at 16:18

Jules

44
2

Thank you, Jules. I shall try the following two approached outlined in: 1. https://arthought.com/colab-mlflow-and-papermill/; 2. https://towardsdatascience.com/colab-synergy-with-mlflow-how-to-monitor-progress-and-store-models-fbfdf7bb0d7d – SvitlanaGA...supportsUkraine Jun 12 '20 at 17:05
1

The link is now dead – desertnaut Sep 09 '20 at 12:03
Yup, I can confirm that it is a dead link – d_- Oct 14 '21 at 19:41

Setting-up MLflow on Google Colab

4 Answers4

Linked