UPDATE: I've narrowed the problem a bit, so I'm removing unnecessary code and examples:
UPDATE 2: after leaving the cron job running at 12 hour intervals for quite some time (with each ending in success, but with nothing written in BQ) we were shocked to discover that, after about a week, one of the cron jobs did successfully write to BigQuery, along with Stackdriver logs stating that "this request caused a new process to be started for your application (...)", as copied below. The following jobs again stopped writing. Now I'm wondering if this is somehow connected to either cached app state (with some expiration period) or credential expiration date which somehow prevent further writes to BigQuery after the first, but do not result in errors.
Problem description:
I'm am trying to set up a cron job in App Engine (standard) to query from and write data back to BigQuery (datasets are in the same project as the deployed app) and cron jobs execute successfully but only write to BigQuery on the first execution after deployment, afterwards they still execute with success but do not write.
The main difference I've found is in Stackdriver logs, for executions which correctly write there are extra debug and informations, for subsequent ones there are no such messages:
2018-04-19 04:44:03.933 CEST
Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None) (/base/data/home/apps/e~<redacted>/lib/urllib3/util/retry.py:200)
2018-04-19 04:44:04.154 CEST
Making request: POST https://accounts.google.com/o/oauth2/token (/base/data/home/apps/e~<redacted>/lib/google/auth/transport/requests.py:117)
2018-04-19 04:44:04.160 CEST
Starting new HTTPS connection (1): accounts.google.com (/base/data/home/apps/e~<redacted>/lib/urllib3/connectionpool.py:824)
2018-04-19 04:44:04.329 CEST
https://accounts.google.com:443 "POST /o/oauth2/token HTTP/1.1" 200 None (/base/data/home/apps/e~<redacted>/lib/urllib3/connectionpool.py:396)
2018-04-19 04:44:04.339 CEST
Starting new HTTPS connection (1): www.googleapis.com (/base/data/home/apps/e~<redacted>/lib/urllib3/connectionpool.py:824)
2018-04-19 04:44:04.802 CEST
https://www.googleapis.com:443 "POST /bigquery/v2/projects/<redacted>/jobs HTTP/1.1" 200 None (/base/data/home/apps/e~<redacted>/lib/urllib3/connectionpool.py:396)
2018-04-19 04:44:04.813 CEST
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
I've tried:
Adding BigQuery DataOwner and User permissions for default appengine service account, but there was no effect.
there are mentions that google.cloud library is not fully supported for standard app engine, so I tried using OAuth2/httplib2/googleapiclient credentials to authenticate but this is the first time I've tried it and I don't understand how to put the pieces together, and without google.cloud library I don't know how to even write a proper query for BQ
other credential setup approaches as suggested below but it seems that connecting to BQ is not the issue, they all connect and write (once), just repeating it within already deployed app engine.
Below is the full implementation:
app.yaml:
runtime: python27
api_version: 1
threadsafe: true
handlers:
- url: /bigquerycron
script: bigquerycron.app
login: admin
libraries:
- name: ssl
version: latest
env_variables:
GAE_USE_SOCKETS_HTTPLIB : 'true'
bigquerycron.py
from __future__ import absolute_import
from google.cloud import bigquery
import webapp2
class MainPage(webapp2.RequestHandler):
def get(self):
self.response.headers['Content-Type'] = 'text/plain'
self.response.write('CRON test page')
def writeDataTest(dataset_id = '<redacted>',table_id='<redacted>'):
client = bigquery.Client.from_service_account_json("credentials.json")
job_config = bigquery.QueryJobConfig()
table_ref = client.dataset(dataset_id).table(table_id)
job_config.destination = table_ref
job_config.write_disposition = 'WRITE_APPEND'
query_job = client.query(
"""SELECT CURRENT_DATETIME() AS Datetime, 'CRON' as Source""", job_config=job_config)
writeDataTest()
app = webapp2.WSGIApplication([
('/bigquerycron', MainPage),
], debug=True)
cron.yaml:
cron:
- url: /bigquerycron
schedule: every 30 minutes