I am experimenting with GCP. I have a local environment with Hadoop. It consists of files stored on HDFS and a bunch of python scripts which make API calls and trigger pig jobs. These python jobs are scheduled via cron.
I want to understand the best way to do something similar in GCP. I understand that I can use GCS as an HDFS replacement. And that Dataproc can be used to spin up Hadoop Clusters and run Pig jobs.
Is it possible to store these Python scripts into GCS, have a cron like schedule to spin up Hadoop clusters, and point to these Python scripts in GCS to run?