0

The title of this question is self-explanatory.

If I have a Kubeflow pipeline in the following manner:

# this is a kubeflow pipelines component dedicated to reformatting csv data to jsonl format
reformat_input_op = reformat_input_component(test_path)

# connect reformat prediction inputs op to batch prediction op
batch_prediction_request_jsonl_path = reformat_input_op.outputs['Batch Prediction Input GCS Path']
    
# batch prediction op
batch_prediction_op = gcc_aip.ModelBatchPredictOp(
        project="<project id>", 
        job_display_name="Model Batch Prediction", 
        location="us-west1", 
        model=model_output, 
        gcs_source_uris=[batch_prediction_request_jsonl_path], 
        instances_format="jsonl",
        gcs_destination_output_uri_prefix="gs://<bucket name>/<directory to file output>/",
        machine_type="n1-standard-4", 
        accelerator_count=2, 
        accelerator_type="NVIDIA_TESLA_P100")

The ModelBatchPredictOp method's argument, gcs_source_uris cannot ingest the output of the previous component that outputs a string path (I get a TypeError: Object of type PipelineParam is not JSON serializable error).

This is troublesome because I do not want to hardcode the path to the GCS bucket that the previous component writes to. I want the path that the previous component outputs, inputted into the next component, at runtime.

What are some workarounds to this? While this is somewhat of a duplicate question to this stackoverflow query: Vertex AI Model Batch prediction, issue with referencing existing model and input file on Cloud Storage

I feel as if the question was not clearly answered there. Is there a way to pass an input from a previous component directly into the ModelBatchPredictOp() object?

Note: I do not want an answer saying "use the .after() method on the ModelBatchPredictOp()". I know this is an option, I don't understand why ModelBatchPredictOp(), a Kubeflow Pipeline component, doesn't admit outputs of the previous components directly. And is there a way that I can pass the output of a previous component directly without storing to some other GCS path and then calling it after using .after()? This is horrible design if there's no way to pass the outputs of the previous components directly.

AndrewJaeyoung
  • 368
  • 3
  • 10
  • Could you try `gcs_source_uris=f’s://{BUCKET_NAME}/….’` instead of `gcs_source_uris=[batch_prediction_request_jsonl_path]` ? – kiran mathew May 23 '23 at 14:53

0 Answers0