What I'm looking to do:
- Run some BigQuery queries
- Output results as JSON files
- Upload JSON files to GCS
How I'm trying to do it:
- Install and initialise Google Cloud SDK:
gcloud auth activate-service-account --key-file="gcp-credentials.json"
- Enable APIs:
gcloud services enable \
bigquery.googleapis.com \
cloudbuild.googleapis.com \
cloudfunctions.googleapis.com \
cloudscheduler.googleapis.com \
pubsub.googleapis.com \
serviceusage.googleapis.com \
storage-component.googleapis.com
- Write up code:
src
|__data
|__queries
|__test_query_1.sql
|__test_query_2.sql
|__test_query_3.sql
|__scripts
|__config.py
|__log.txt
|__main.py
|__requirements.txt
requirements.txt
google-cloud-bigquery
google-cloud-storage
config.py:
from pathlib import Path
src_dir = Path(__file__).absolute().parent
config_vars = {
"data_dir": src_dir.parent / "data",
"queries_dir": src_dir.parent / "queries",
"bucket": "...",
}
main.py:
import ...
data_dir = config.config_vars["data_dir"]
queries_dir = config.config_vars["queries_dir"]
def main(data, context):
...
if __name__ == "__main__":
main("data", "context")
So the main.py
script, takes all queries in the queries folder, runs them, outputs them as JSON and then uploads them to a bucket called "test-bucket-20201219". If the bucket doesn't exist then it creates it.
The script runs fine locally but when it's deployed and scheduled in GCP via PubSub and Google Scheduler, it runs and creates the bucket but doesn't upload the files...I'm not sure what I'm doing wrong. Any help would be much appreciated. Tried everything - e.g. permitted the PROJECTID@appspot.gserviceaccount.com to add objects to bucket.
Logging statements:
2020-12-20 18:43:50,656 | INFO | Uploading test_query_2.json to test-bucket-20201219.
2020-12-20 18:43:50,962 | DEBUG | https://storage.googleapis.com:443 "POST /upload/storage/v1/b/test-bucket-20201219/o?uploadType=multipart HTTP/1.1" 200 776
2020-12-20 18:43:50,963 | INFO | Uploading test_query_3.json to test-bucket-20201219.
2020-12-20 18:43:51,238 | DEBUG | https://storage.googleapis.com:443 "POST /upload/storage/v1/b/test-bucket-20201219/o?uploadType=multipart HTTP/1.1" 200 776
2020-12-20 18:43:51,239 | INFO | Uploading test_query_1.json to test-bucket-20201219.
2020-12-20 18:43:51,466 | DEBUG | https://storage.googleapis.com:443 "POST /upload/storage/v1/b/test-bucket-20201219/o?uploadType=multipart HTTP/1.1" 200 775