I'm trying to migrate csv files from Google Cloud Storage (GCS), which have been exported from BigQuery, to a PostgreSQL Google cloud sql instance using a python script.
I was hoping to use the Google API but found this in the documentation:
Importing CSV data using the Cloud SQL Admin API is not supported for PostgreSQL instances.
As an alternative I could use psycopg2
library and stream the rows of the csv file into the SQL instance. I can do this three ways
- Line by line: Read each line and then submit the insert command and then commit
- Batch stream: Read each line and then submit the insert commands and then commit after 10 lines or 100 etc.
- The entire csv: Read each line and submit the insert commands and then only commit at the end of the document.
My concerns are these csv files could contain millions of rows and running this process for any of the three options mentioned above seems like a bad idea to me.
What alternatives do I have? Essentially I have some raw data in BigQuery on which we do some preprocessing before exporting to GCS in preparation for importing to the PostgreSQL instance. I need to export this preprocessed data from BigQuery to the PostgreSQL instance.
This is not a duplicate of this question as I'm preferably looking for the solution which exports data from BigQuery to the PostgreSQL instance wether it be via GCS or direct.