3

In order to speed up model training in Azure ML, I am using Azure DataReference's as_download() functionality to download the data from blob storage to a compute instance, instead of mounting. To do that I am using PythonScriptStep(...). However, PythonScriptStep uses an envirnoment (in my case, a custom docker) to run the job. With this, the data is copied into the docker /tmp/dataset, rather than the compute instance's /tmp/dataset and because /tmp/dataset inside docker is mapped to /mnt/... outside docker, I don't gain any speed up during model training. I tried to use docker -v /tmp:/tmp with no luck.

So here is the question: How can I copy data DIRECTLY from blob storage to compute instance's SSD?

I tried almost everything with no luck.

Sina
  • 154
  • 11
  • Have you already tried to connect datastore and run your training on remote azure ml cluster and not just one single compute instance? maybe that will give your run some velocity. https://learn.microsoft.com/en-us/azure/machine-learning/how-to-datastore – Sysanin Jul 18 '23 at 12:32

0 Answers0