I would like to setup Mlflow to have the following components :
- Backend store (local) : using a SQLite database locally to store Mlflow entities (run_id, params, metrics...)
- Artifact store (remote) : using a blob storage on my Azure Data Lake Storage Gen2 to store the output files (versioned datasets, serialized models, images, ...) related to my model
- Tracking server : by using something that looks like this command
z
mlflow server --backend-store-uri sqlite:///C:\sqlite\db\mlruns.db --default-artifact-root wasbs://container-name@storage_account_name.blob.core.windows.net/mlartifacts -h 0.0.0.0 -p 8000
Where mlruns.db is a database that I created in SQLite (inside a db folder) and mlartifacts is the folder I created inside the blob container to receive all the output files.
I run this command and then I do and mlflow run (or a kedro run as I'm using Kedro) but almost nothing happens. The database is populated with 12 tables but all empty while nothing happens inside the Data lake.
What I want should look like Scenario 4 in the documentation.
For the artifact store, I couldn't find detailed instructions. I tried to look at Mlflow's documentation here but this is not very helpful (i'm still a beginner). They say that:
MLflow expects Azure Storage access credentials in the AZURE_STORAGE_CONNECTION_STRING, AZURE_STORAGE_ACCESS_KEY environment variables or having your credentials configured such that the DefaultAzureCredential(). class can pick them up.
However, even when adding the env variables, nothing seems to be stored in the data lake. I created the two env variables (on Windows 10):
AZURE_STORAGE_ACCESS_KEY = wasbs://container-name@storage_account_name.blob.core.windows.net/mlartifacts
AZURE_STORAGE_CONNECTION_STRING = DefaultEndpointsProtocol=https;AccountName=storagesample;AccountKey=. I got it by following this path on Azure Portal : Storage account/Access keys/Connection string (took the one of key 2).
They also state that :
Also, you must run pip install azure-storage-blob separately (on both your client and the server) to access Azure Blob Storage. Finally, if you want to use DefaultAzureCredential, you must pip install azure-identity; MLflow does not declare a dependency on these packages by default.
I added them in my project requirements, but what do they mean exactly by installing on both the client and the server ? How azure-identity helps in the setup ?
Could you please help me with a step by step instructions on how to make the complete setup ?
Thank you in advance !