I am using dbx
cli to deploy my workflow into databricks. I have .dbx/project.json
configured below:
{
"environments": {
"default": {
"profile": "test",
"storage_type": "mlflow",
"properties": {
"workspace_directory": "/Shared/dbx/projects/test",
"artifact_location": "dbfs:/dbx/test"
}
}
},
"inplace_jinja_support": false,
"failsafe_cluster_reuse_with_assets": false,
"context_based_upload_for_execute": false
}
Everytime when I run dbx deploy ...
, it stores my tasks scripts into the DBFS with some hash folder. If I ran 100 times dbx deploy ...
, it creates 100 hash folders to store my artifacts.
Questions
- How do I clean up the folders ?
- Any retention policy or rolling policy that keeps the last X folders only ?
- Is there a way to reuse the same folder everytime we deploy ?
As you can see, there are alot of folders generated whenever we ran dbx deploy
. We just want to use the latest, the older one is not needed any more