0

I've setup a Python script that will take certain bigquery tables from one dataset, clean them with a SQL query, and add the cleaned tables to a new dataset. This script works correctly. I want to set this up as a cloud function that triggers at midnight every day.

I've also used cloud scheduler to send a message to a pubsub topic at midnight every day. I've verified that this works correctly. I am new to pubsub but I followed the tutorial in the documentation and managed to setup a test cloud function that prints out hello world when it gets a push notification from pubsub.

However, my issue is that when I try to combine the two and automate my script - I get a log message that the execution crashed:

Function execution took 1119 ms, finished with status: 'crash'

To help you understand what I'm doing, here is the code in my main.py:

# Global libraries
import base64

# Local libraries
from scripts.one_minute_tables import helper

def one_minute_tables(event, context):

    # Log out the message that triggered the function
    print("""This Function was triggered by messageId {} published at {}
    """.format(context.event_id, context.timestamp))

    # Get the message from the event data
    name = base64.b64decode(event['data']).decode('utf-8')

    # If it's the message for the daily midnight schedule, execute function
    if name == 'midnight':
        helper.format_tables('raw_data','table1')
    else:
        pass

For the sake of convenience, this is a simplified version of my python script:

# Global libraries
from google.cloud import bigquery
import os

# Login to bigquery by providing credentials
credential_path = 'secret.json'
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = credential_path

def format_tables(dataset, list_of_tables):

    # Initialize the client
    client = bigquery.Client()

    # Loop through the list of tables
    for table in list_of_tables:

        # Create the query object
        script = f"""
            SELECT *
            FROM {dataset}.{table}
        """

        # Call the API
        query = client.query(script)

        # Wait for job to finish
        results = query.result()

        # Print
        print('Data cleaned and updated in table: {}.{}'.format(dataset, table))

This is my folder structure:

enter image description here

And my requirements.txt file has only one entry in it: google-cloud-bigquery==1.24.0

I'd appreciate your help in figuring out what I need to fix to run this script with the pubsub trigger without getting a log message that says the execution crashed.

EDIT: Based on the comments, this is the log of the function crash

{
  "textPayload": "Function execution took 1078 ms, finished with status: 'crash'",
  "insertId": "000000-689fdf20-aee2-4900-b5a1-91c34d7c1448",
  "resource": {
    "type": "cloud_function",
    "labels": {
      "function_name": "one_minute_tables",
      "region": "us-central1",
      "project_id": "PROJECT_ID"
    }
  },
  "timestamp": "2020-05-15T16:53:53.672758031Z",
  "severity": "DEBUG",
  "labels": {
    "execution_id": "x883cqs07f2w"
  },
  "logName": "projects/PROJECT_ID/logs/cloudfunctions.googleapis.com%2Fcloud-functions",
  "trace": "projects/PROJECT_ID/traces/f391b48a469cbbaeccad5d04b4a704a0",
  "receiveTimestamp": "2020-05-15T16:53:53.871051291Z"
}
Abhay
  • 827
  • 9
  • 34
  • 1
    When you look at the logs of your cloud function, what is the traceback error? I am assuming the first python script you posted (def one_minute_tables) is the one being triggered by the pubsub, correct? – Zavalagrah May 14 '20 at 17:08
  • What's the configuration of your function? Have you created an trigger-http function and a http push subscription to PubSub? Or do you create a --trigger-topic? – guillaume blaquiere May 14 '20 at 18:45
  • One idea is to try to catch the stacktrace that Cloud Functions suppress sometimes. Use [the Approach 2 in this answer](https://stackoverflow.com/a/54396161/1119153) as a guidance – manasouza May 15 '20 at 15:58
  • @guillaumeblaquiere I've set it up with a trigger-topic `gcloud functions deploy one_minute_tables --runtime python37 --trigger-topic scheduled_updates` – Abhay May 15 '20 at 16:51
  • @MajorHonda Yes, the function `one_minute_tables` is triggered by pubsub. I looked at the log and I'm going to edit my question to add the log of the function crash – Abhay May 15 '20 at 17:03
  • I think I found your bug. I answer to be clearer, if it doesn't work, I will delete my answer. – guillaume blaquiere May 15 '20 at 18:30

1 Answers1

1

The problem comes from the list_of_tables attributes. You call your function like this

    if name == 'midnight':
        helper.format_tables('raw_data','table1')

And you iterate on your 'table1' parameter

Perform this, it should work

    if name == 'midnight':
        helper.format_tables('raw_data',['table1'])
guillaume blaquiere
  • 66,369
  • 2
  • 47
  • 76