4

Inside a function that returns a Pipeline, where a Parameter is defined, e.g. (taken from here)

def get_pipeline(...):
   
    foo = ParameterString(
        name="Foo", default_value="foo"
    )

   # pipeline's steps definition here
   step = ProcessingStep(name=...,
                         job_arguments=["--foo", foo]
   )

   return pipeline = Pipeline(
        name=pipeline_name,
        parameters=[...],
        steps=[...],
        sagemaker_session=sagemaker_session,
    )

I know I can access the default value of a parameter by simply calling foo.default_value, but how can I access its value when the default value is overridden at the runtime, e.g. by using

pipeline.start(parameters=dict(Foo='bar'))

?

My assumption is that in that case I don't want to read the default value, since it has been overridden, but the Parameter API is very limited and does not provided anything expect for name and default_value.

uarfr
  • 43
  • 5

1 Answers1

1

As written in the documentation:

Pipeline parameters can only be evaluated at run time. If a pipeline parameter needs to be evaluated at compile time, then it will throw an exception.

A way to use parameters as ProcessingStep arguments

If your requirement is to use them for a pipeline step, in particular the ProcessingStep, you will have to use the run method to use the arguments (which is different from job_arguments).

See this official example.

By passing the pipeline_session to the sagemaker_session, calling .run() does not launch the processing job, it returns the arguments needed to run the job as a step in the pipeline.

step_process = ProcessingStep(
   step_args=your_processor.run(
       # ...
       arguments=["--foo", foo]
   )
)

In addition, there are some limitations: Not all built-in Python operations can be applied to parameters.

An example taken from the link above:

# An example of what not to do
my_string = "s3://{}/training".format(ParameterString(name="MyBucket", default_value=""))

# Another example of what not to do
int_param = str(ParameterInteger(name="MyBucket", default_value=1))

# Instead, if you want to convert the parameter to string type, do
int_param.to_string()

# A workaround is to use Join
my_string = Join(on="", values=[
    "s3://",
    ParameterString(name="MyBucket", default_value=""),
    "/training"]
)

A way to use parameters to manipulate the pipeline internally

Personally, I prefer to pass the value directly when you get the pipeline definition before the start:

def get_pipeline(my_param_hardcoded, ...):

    # here you can use my_param_hardcoded
   
    my_param = ParameterString(
        name="Foo", default_value="foo"
    )

   # pipeline's steps definition here

   return pipeline = Pipeline(
        name=pipeline_name,
        parameters=[my_param, ...],
        steps=[...],
        sagemaker_session=sagemaker_session,
    )
   return pipeline
pipeline = get_pipeline(my_param_hardcoded, ...)
pipeline.start(parameters=dict(Foo=my_param_hardcoded))

Obviously this is not a really elegant way, but I do not think it is conceptually wrong because after all it is a parameter that will be used to manipulate the pipeline and cannot be pre-processed beforehand (e.g. in a configuration file).

An example of use is the creation of a name which can be based on the pipeline_name (which is clearly passed in the get_pipeline() and a pipeline parameter). For example, if we wanted to create a custom name for a step, it could be given by the concatenation of the two strings, and this cannot happen at runtime but must be done with this trick.

Giuseppe La Gualano
  • 1,491
  • 1
  • 4
  • 24
  • 2
    Why passing a parameter that I want to e.g. give as argument to a ProcessingStep (see edited post) is considered needed at compile time and not at run time? – uarfr Dec 19 '22 at 18:02
  • OK thanks for the clarification, I have edited the answer by inserting what you need with the official sources. I also left the additional suggestions in any case. Does that answer your question now? – Giuseppe La Gualano Dec 19 '22 at 19:00
  • That works with a `ParameterString` but not with a `ParameterFloat`, since it can't be converted to a string. This brings me back to my original issue. As much as I appreciate your help and hacky solution, to me that represents a workaround to a strong limitation/issue of Sagemaker – uarfr Dec 20 '22 at 08:16
  • I confim, if you want to bypass also this limitation, you can use ParameterString instead of ParameterFloat (also for the reason that arguments will all be evaluated as strings) and cast string to float internally in your scripts. If I have answered your question you may accept it, thank you! – Giuseppe La Gualano Dec 20 '22 at 09:18