1

We currently have a Data Factory pipeline that is able to call one of our ML Studio Pipelines successfully. After the ML Studio Pipeline completed, we wanted Azure Data Factory to pick up the results of the ML Studio Pipeline and store the results in SQL Server.

We found the PipelineData class stores the results in a folder in blob based on the child run id, which makes it hard for Data factory to pick up the results. We then discovered OutputFileDatasetConfig which allows ML Studio to save the results to a static location for Data Factory. This worked great for Data Factory except OutputFileDatasetConfig doesn't always work :( since it's experimental class. It took us a while to figure this out and we even created a stackoverflow question for this, which we resolved, and can be found here: Azure ML Studio ML Pipeline - Exception: No temp file found

We returned to using PipelineData class which stores the results in a folder in blob based on the child run id, but we can't figure out how to get Data factory to find the blob based on the child run id of the ML Studio Pipeline it just ran.

So my question is, how do you get Data Factory to pick up the results of a ML Studio Pipeline which was triggered from a Data Factory Pipeline???

Here is a simple visual of the Data Factory pipeline we're trying to build.

Step 1: Store Data in azure file store -->
Step 2: Run ML Studio scoring Pipeline -->
Step 3: Copy Results to SQL Server

Step 3 is the step we can't figure out. Any help would be greatly appreciated. Thanks and happy coding!

yeamusic21
  • 276
  • 3
  • 11

1 Answers1

1

I think I answered my own question. Turns out my question is similar to another question that was asked a few months ago, and their top solution worked for me.

How to write Azure machine learning batch scoring results to data lake?

I was able to use DataTransferStep as follows.

transfer_ml_to_blob = DataTransferStep(
    name="transfer_ml_to_blob",
    source_data_reference=output_dir,
    destination_data_reference=blob_data_ref,
    compute_target=data_factory_compute,
    source_reference_type='directory', 
    destination_reference_type='directory'
) 

Some other helpful resources:

https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb

https://social.msdn.microsoft.com/Forums/en-US/026b9b1d-6961-4217-b179-0c1973ac1fa2/data-transfer-job-failed-with-unexpected-error-systeminvalidoperationexception-blob-contains-both?forum=AzureMachineLearningService#7b46c5eb-b7f1-4c2f-a6d0-553672a83e7a

Azure ML PipelineData with DataTransferStep results in 0 bytes file

yeamusic21
  • 276
  • 3
  • 11