I am working with a legacy project of Kubeflow, the pipelines have a few components in order to apply some kind of filters to data frame.
In order to do this, each component downloads the data frame from S3 applies the filter and uploads it into S3 again.
In the components where the data frame is used for training or validating the models, download from S3 the data frame.
The question is about if this is a best practice, or is better to share the data frame directly between components, because the upload to the S3 can fail, and then fail the pipeline.
Thanks