0

I created a Google Cloud Function with a Pub/Sub trigger (Triggered by a Stackdriver sink). Then it change the format of this data and save it to the BigQuery.

const {BigQuery} = require('@google-cloud/bigquery');
const bigquery = new BigQuery();

const environment= process.env.ENVIRONMENT || 'Dev';

insertIntoBigQueryClient = async (locationObject) => {
    const metadata = locationObject.jsonPayload.metadata;

    const row = [{
        driverId: metadata.driverId,
        driverPhone: metadata.driverPhone,
        driverStatus: metadata.driverStatus,
        driverLocation: metadata.driverLocation.coordinates,
        timestamp: locationObject.timestamp
    }];
    // Insert data into a table
    return await bigquery
        .dataset(`YassirBackendLogging${environment}`)
        .table('DriverLocationStatus')
        .insert(row);
};


driverLocationStatusProcessing = async (pubSubEvent, context) => {
    try {
        const logObject = JSON.parse(Buffer.from(pubSubEvent.data, 'base64').toString());
        insertIntoBigQueryClient(logObject);
    } catch(error){
        console.error(error);
    }
};

// this part is only to have multi functions. one for each envirenment 
switch (environment) {
    case 'Prod' :
        exports.driverLocationStatusProcessingProd = async (pubSubEvent, context) => {
            await driverLocationStatusProcessing(pubSubEvent, context);
        };
        break;
    case 'Dev' :
        exports.driverLocationStatusProcessingDev = async (pubSubEvent, context) => {
            await driverLocationStatusProcessing(pubSubEvent, context);
        };
        break;

    default :
        exports.driverLocationStatusProcessingDev = async (pubSubEvent, context) => {
            await driverLocationStatusProcessing(pubSubEvent, context);
        };
        break;
}

And this is the cloud build code

steps:
  - name: 'gcr.io/cloud-builders/gcloud'
    args:
      - functions
      - deploy
      - 'driverLocationStatusProcessing$_ENVIRONMENT'
      - '--set-env-vars'
      - ENVIRONMENT=$_ENVIRONMENT
      - '--trigger-topic'
      - 'DriverLocationStatus$_ENVIRONMENT'
      - '--runtime'
      - nodejs8
      - '--timeout=540'
    dir: 'driver-location-status'

now, this function is working perfectly but from time to time there are some errors out of no where. like the following :

Error: Could not load the default credentials. Browse to https://cloud.google.com/docs/authentication/getting-started for more information. at GoogleAuth.getApplicationDefaultAsync (/srv/node_modules/google-auth-library/build/src/auth/googleauth.js:161:19) at at process._tickDomainCallback (internal/process/next_tick.js:229:7)

Error: function crashed out of request scope Function cannot be executed.

I hope i will have some feedback concerning that matter. Maybe something to do with async task?

Travis Webb
  • 14,688
  • 7
  • 55
  • 109
  • 1) Never allow your code to crash in Cloud Functions. Always implement try/catch and error handling logic. 2) Not being able to load credentials could be a networking problem (transient Google problem) or a logic issue in your code. 2) I would modify the code to directly fetch ADC credentials and verify them. If the credentials are not valid, exit. Pub/Sub will retry later. – John Hanley Oct 03 '19 at 14:50
  • 3) Write your code to assume failure and handle failures, retries, timeouts, etc. This will help you have a better idea where the problem originates. 4) Node.js is very popular but I would not write Cloud Functions in Node.js. Go and Python makes for simpler programs that are easier to debug (my opinion after writing thousands of functions). – John Hanley Oct 03 '19 at 14:50
  • i resolved the unhandled rejection problem. but considering the credential problem, it is not a code issue. it happening like once in a million but it is really bugging me. since i can't afford to loose data. i hope there would be a way to solve it – Bilel Abderrahmane BENZIANE Oct 03 '19 at 18:33
  • I provided suggestions on how to handle this (item #2 and #3). Remember that catching an error is not the same thing as designing logic to detect and recover from errors. – John Hanley Oct 03 '19 at 18:43
  • well, thank you for your suggestions. they were useful. i am now handling errors separately and it appeared that the problem was not really in my code. after i modified the memory parameter of sone of the functions and i added fixed the number of instances to 50. the problem was solved. i have 5 functions in total. 2 are executed like once every 3seconds and other like 200 times per second. so i tested my theory and it worked. what do you think of that? – Bilel Abderrahmane BENZIANE Oct 05 '19 at 16:32
  • Review pricing and quotas for BigQuery. You cannot hammer BiqQuery with updates. This is more expensive than batch loads and you will hit quotas to throttle/fail your requests (5 operations per 10 seconds). Do not treat BiqQuery as a real-time or SQL database. BigQuery is designed as a serverless Big Data platform which means that 99% your requests are queries. Do not attempt lots of small updates. https://cloud.google.com/bigquery/quotas – John Hanley Oct 05 '19 at 16:48
  • yeah I agree. i was exporting data from stackdriver to bigquery using syncs but the format was not really flexible (especially with complex objects) which lead me to cloud functions just to serve that purpose. not sure if there is a better way but i am still experiment with all i can find to get some where. do suggest something else? – Bilel Abderrahmane BENZIANE Oct 08 '19 at 18:39
  • Consider using a normal export from Stackdriver to BiqQuery. Then run queries against the data stored in BigQuery create a VIEW that you prefer. Faster and cheaper. – John Hanley Oct 08 '19 at 19:22
  • Well, that would have been my first choice but knowing that I table names are generated automatically using the logName attribute which cannot be altered from the logs (I checked) it cause the following problem : When new log entry with different format is sent to the bigquery using sinks, it does not get inserted into the bigquery table since it is not in the same format. I checked and asked a lot on the stackoverflow. I did lots of experiments. There has to be some data pre processing middleware. I picked cloud function but there might be a better approach. – Bilel Abderrahmane BENZIANE Oct 08 '19 at 19:56

1 Answers1

2

This looks like a potential logic error in your driverLocationStatusProcessing function:

try {
    ...
    return insertIntoBigQueryClient(logObject);
    // ^^ add return statement

I'm not sure if this is the cause of your issue, but your comments point to a potential race condition ("it happening like once in a million") and without that return, the await won't do what you expect it to.

This may also be relevant: Could not load the default credentials? (Node.js Google Compute Engine tutorial)

Travis Webb
  • 14,688
  • 7
  • 55
  • 109
  • 1
    ohhh seriously thank you. i forgot the return in there. but even without the return, after increasing the memory from 256MB to 1GB and the number of instances to 50, i totally solved the problem. but i guess i will reduce the memory again and add the return statement to see what happen. what do you think of that? – Bilel Abderrahmane BENZIANE Oct 05 '19 at 16:37