I was wondering if it is possible/how easy it would be to implement a TFX pipeline (on a real dataset, with 100+ GB dataset, not a tutorial with a small dataset) in AWS?
For the orchestration, I might use Kubeflow. But I suppose, the major issue would be setting up a proper scalable runner for the Apache Beam. I am thinking of using Apache Flink for that.
Anyone with experience doing it? How would you go about putting a TF in production in AWS in general when you need to train the model on a regular basis on new data, do you write the pipeline from scratch or use some tool?