My pipeline run directories are being created in my defaul bucket dir and it is getting messy. I've been trying to set a subdirectory in my s3 default bucket to store all my pipeline run directories (PreProcess, Train, Evaluate, Interpret...) but I haven't suceeded yet. Can someone help me to do that, please?
I've tried changing my step name from CrossPreprocess-Data to sagemaker/cross-project/CrossPreprocess-Data and it worked. It created my pipeline directories inside the dir sagemaker/cross-project
step_process = ProcessingStep(
name="sagemaker/cross-project/CrossPreprocess-Data",
processor=sklearn_processor,
outputs=[
ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
ProcessingOutput(output_name="validation", source="/opt/ml/processing/val"),
ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
ProcessingOutput(output_name="metafiles", source="/opt/ml/processing/metafiles"),
],
code=os.path.join(BASE_DIR, "preprocess.py"),
job_arguments=["--input-data", input_data, "--run-datetime", run_datetime, "--project-name", project_name],
)
BUT, on the next step, I got this error and I couldn't move on with my execution
ClientError: An error occurred (ValidationException) when calling the UpdatePipeline operation: Unable to parse pipeline definition. Invalid property reference 'Steps.sagemaker/cross-project/CrossPreprocess-Data.ProcessingOutputConfig.Outputs['train'].S3Output.S3Uri' in GetFunction definition.