In order to speed up model training in Azure ML, I am using Azure DataReference's as_download()
functionality to download the data from blob storage to a compute instance, instead of mounting. To do that I am using PythonScriptStep(...)
. However, PythonScriptStep
uses an envirnoment (in my case, a custom docker) to run the job. With this, the data is copied into the docker /tmp/dataset
, rather than the compute instance's /tmp/dataset
and because /tmp/dataset
inside docker is mapped to /mnt/...
outside docker, I don't gain any speed up during model training. I tried to use docker -v /tmp:/tmp
with no luck.
So here is the question: How can I copy data DIRECTLY from blob storage to compute instance's SSD?
I tried almost everything with no luck.