0

What I'm looking to do:

  • Run some BigQuery queries
  • Output results as JSON files
  • Upload JSON files to GCS

How I'm trying to do it:

  1. Install and initialise Google Cloud SDK: gcloud auth activate-service-account --key-file="gcp-credentials.json"
  2. Enable APIs:
gcloud services enable \
    bigquery.googleapis.com \
    cloudbuild.googleapis.com \
    cloudfunctions.googleapis.com \
    cloudscheduler.googleapis.com \
    pubsub.googleapis.com \
    serviceusage.googleapis.com \
    storage-component.googleapis.com
  1. Write up code:
src
|__data
|__queries
      |__test_query_1.sql
      |__test_query_2.sql
      |__test_query_3.sql
|__scripts
      |__config.py
      |__log.txt
      |__main.py
      |__requirements.txt

requirements.txt

google-cloud-bigquery
google-cloud-storage

config.py:

from pathlib import Path

src_dir = Path(__file__).absolute().parent

config_vars = {
    "data_dir": src_dir.parent / "data",
    "queries_dir": src_dir.parent / "queries",
    "bucket": "...",
}

main.py:

import ...
data_dir = config.config_vars["data_dir"]
queries_dir = config.config_vars["queries_dir"]

def main(data, context):
    ...

if __name__ == "__main__":
    main("data", "context")

So the main.py script, takes all queries in the queries folder, runs them, outputs them as JSON and then uploads them to a bucket called "test-bucket-20201219". If the bucket doesn't exist then it creates it.

The script runs fine locally but when it's deployed and scheduled in GCP via PubSub and Google Scheduler, it runs and creates the bucket but doesn't upload the files...I'm not sure what I'm doing wrong. Any help would be much appreciated. Tried everything - e.g. permitted the PROJECTID@appspot.gserviceaccount.com to add objects to bucket.

Logging statements:

2020-12-20 18:43:50,656 | INFO | Uploading test_query_2.json to test-bucket-20201219.
2020-12-20 18:43:50,962 | DEBUG | https://storage.googleapis.com:443 "POST /upload/storage/v1/b/test-bucket-20201219/o?uploadType=multipart HTTP/1.1" 200 776
2020-12-20 18:43:50,963 | INFO | Uploading test_query_3.json to test-bucket-20201219.
2020-12-20 18:43:51,238 | DEBUG | https://storage.googleapis.com:443 "POST /upload/storage/v1/b/test-bucket-20201219/o?uploadType=multipart HTTP/1.1" 200 776
2020-12-20 18:43:51,239 | INFO | Uploading test_query_1.json to test-bucket-20201219.
2020-12-20 18:43:51,466 | DEBUG | https://storage.googleapis.com:443 "POST /upload/storage/v1/b/test-bucket-20201219/o?uploadType=multipart HTTP/1.1" 200 775
AK91
  • 671
  • 2
  • 13
  • 35
  • Assuming that the code is failing at some point, we should be looking in the Cloud Logging logs to see what (if anything) is logged. Consider also adding log statements to your code for debugging purposes and validating that we have reached all the points we expect. – Kolban Dec 21 '20 at 00:00
  • Cloud Logging logs says everything was fine, except it doesn't say that files were uploaded. I've added log statements but when I go to the Cloud Functions page and try to check source I get a generic error message: "An unknown error has occurred in Cloud Functions. The attempted action failed, please try again." and where it's supposed to show the source code I get a "Unknown error while fetching the archive". I'm going to shut down the project and then restart it hoping that'll create missing dependencies? – AK91 Dec 21 '20 at 04:12
  • Got the logs from downloading the zip file. Added them to description – AK91 Dec 21 '20 at 04:35
  • It will also help if you provide the whole code in main.py how did you upload the files or create the bucket. Try to enable the `storage.googleapis.com` and here are the other links might help in terms of uploading [https://stackoverflow.com/a/37102815/8753991](https://stackoverflow.com/a/37102815/8753991), [https://cloud.google.com/storage/docs/uploading-objects...](https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python) – JM Gelilio Dec 22 '20 at 05:47

1 Answers1

0

Appreciate the help from everyone - got it too work somehow. I think I was missing one of the buckets that are automatically generated the first time your cloud function runs/deploys (staging.PROJECT_ID.appspot.com). Also, since I didn't want to store credentials along with function's repo, I deployed the function from gcloud using the --service-account flag and in the form of PROJECT_ID@appspot.gserviceaccount.com...tbh I'm not entirely sure if what I've done is correct but that worked for me.

Unable to deploy google cloud functions

AK91
  • 671
  • 2
  • 13
  • 35