Is it possible to choose the S3 directory I want to store my pipeline run files?

Question

My pipeline run directories are being created in my defaul bucket dir and it is getting messy. I've been trying to set a subdirectory in my s3 default bucket to store all my pipeline run directories (PreProcess, Train, Evaluate, Interpret...) but I haven't suceeded yet. Can someone help me to do that, please?

I've tried changing my step name from CrossPreprocess-Data to sagemaker/cross-project/CrossPreprocess-Data and it worked. It created my pipeline directories inside the dir sagemaker/cross-project

    step_process = ProcessingStep(
        name="sagemaker/cross-project/CrossPreprocess-Data",
        processor=sklearn_processor,
        outputs=[
            ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
            ProcessingOutput(output_name="validation", source="/opt/ml/processing/val"),
            ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
            ProcessingOutput(output_name="metafiles", source="/opt/ml/processing/metafiles"),
        ],
        code=os.path.join(BASE_DIR, "preprocess.py"),
        job_arguments=["--input-data", input_data, "--run-datetime", run_datetime, "--project-name", project_name],
    )

BUT, on the next step, I got this error and I couldn't move on with my execution

ClientError: An error occurred (ValidationException) when calling the UpdatePipeline operation: Unable to parse pipeline definition. Invalid property reference 'Steps.sagemaker/cross-project/CrossPreprocess-Data.ProcessingOutputConfig.Outputs['train'].S3Output.S3Uri' in GetFunction definition.

score 0 · Answer 1 · answered Dec 21 '22 at 15:27

0

You can set the destination in the ProcessingOutput to specify where in the S3 bucket your files should be saved to.

https://sagemaker.readthedocs.io/en/stable/api/training/processing.html#sagemaker.processing.ProcessingOutput

answered Dec 21 '22 at 15:27

Kirit Thadaka

429
2
5

thanks for the reply, Kirit! but this still didn't solve my issue. When I execute pipeline.upsert(role_arn=role), the PreProcess dir with my preprocess.py is still created in my default bucket :( – Sayuri Iwai Dec 21 '22 at 16:42

Is it possible to choose the S3 directory I want to store my pipeline run files?

1 Answers1