You can use PythonScriptStep
class in Azure Machine Learning to execute a python script to get formatted data path based on trigger.
Example:
Python script file (script.py
):
import datetime
# Trigger time is same as current_time
current_time = datetime.datetime.now()
# Format the current time to match the dataset path format
dataset_path = "path_on_datastore/{}/{}/{}/some_data.tsv".format(current_time.year, current_time.month, current_time.day)
# Use the dataset path in your further processing or operations
print(dataset_path)
With the script you can create a pipeline:
from azureml.core import Workspace, Experiment, Dataset
from azureml.pipeline.core import Pipeline, PipelineData, ScheduleRecurrence
from azureml.pipeline.steps import PythonScriptStep
workspace = Workspace.from_config()
script_step = PythonScriptStep(
name="Get Dataset Path",
script_name="script.py",
compute_target="targetCompute",
inputs=[],
outputs=[],
source_directory="./",
allow_reuse=False
)
Then you can schedule the pipeline:
# Daily execution at 8:00 AM
daily_schedule = ScheduleRecurrence(frequency="Day", interval=1, hours=[8], minutes=[0])
pipeline = Pipeline(workspace=workspace, steps=[script_step])
pipeline_schedule = pipeline.schedule( start_time="2023-06-01T08:00:00", description="Daily pipeline schedule", recurrence=daily_schedule )
# Pipeline Execution
experiment = Experiment(workspace, "dataset_scheduling_experiment")
pipeline_schedule.submit(pipeline_run=experiment.submit(pipeline))
To disable or update the schedule:
# Specify the name of the pipeline schedule
schedule_name = 'your_schedule_name'
schedule = Schedule.get(workspace, schedule_name)
# Disable the schedule
schedule.disable()
# Update the schedule
schedule.update()
Above example explain how you can use PythonScriptStep` class and current time in datetime as trigger time.
For more information, please refer to this.
Note: Make sure to make changes in python script and datastore paths as necessary.