The title of this question is self-explanatory.
If I have a Kubeflow pipeline in the following manner:
# this is a kubeflow pipelines component dedicated to reformatting csv data to jsonl format
reformat_input_op = reformat_input_component(test_path)
# connect reformat prediction inputs op to batch prediction op
batch_prediction_request_jsonl_path = reformat_input_op.outputs['Batch Prediction Input GCS Path']
# batch prediction op
batch_prediction_op = gcc_aip.ModelBatchPredictOp(
project="<project id>",
job_display_name="Model Batch Prediction",
location="us-west1",
model=model_output,
gcs_source_uris=[batch_prediction_request_jsonl_path],
instances_format="jsonl",
gcs_destination_output_uri_prefix="gs://<bucket name>/<directory to file output>/",
machine_type="n1-standard-4",
accelerator_count=2,
accelerator_type="NVIDIA_TESLA_P100")
The ModelBatchPredictOp
method's argument, gcs_source_uris
cannot ingest the output of the previous component that outputs a string path (I get a TypeError: Object of type PipelineParam is not JSON serializable
error).
This is troublesome because I do not want to hardcode the path to the GCS bucket that the previous component writes to. I want the path that the previous component outputs, inputted into the next component, at runtime.
What are some workarounds to this? While this is somewhat of a duplicate question to this stackoverflow query: Vertex AI Model Batch prediction, issue with referencing existing model and input file on Cloud Storage
I feel as if the question was not clearly answered there. Is there a way to pass an input from a previous component directly into the ModelBatchPredictOp()
object?
Note: I do not want an answer saying "use the .after()
method on the ModelBatchPredictOp()
". I know this is an option, I don't understand why ModelBatchPredictOp()
, a Kubeflow Pipeline component, doesn't admit outputs of the previous components directly. And is there a way that I can pass the output of a previous component directly without storing to some other GCS path and then calling it after using .after()
? This is horrible design if there's no way to pass the outputs of the previous components directly.