Can you please help provide ways how I can load my google analytics data in biGQuery to Redshift? Can Cloud Function be used for this? or, how do I trigger this from the BigQuery side instead of using a python script to call bigquery?
Asked
Active
Viewed 711 times
1 Answers
2
You can utilise two cloud functions to get the data into S3. Once it's in S3, you can have your own mechanism (e.g. Lambda function) to import the data into Redshift.
Preamble: set up a Stackdriver export trigger
We will trigger our first Cloud Function whenever the latest Google Analytics daily sessions table is available. This is done through triggering a Pub/Sub message whenever Stackdriver Logging indicates that the latest table has been loaded. In order to set up this trigger, follow these steps (refer to the "Pub/Sub & Stackdriver" section).
Cloud Function 1: Export BigQuery table
- Trigger: Pub/Sub message for when a new daily table has been loaded
- Workflow
- Export table as JSON (or Avro, Parquet)
- Save JSON in Google Cloud Storage
Cloud Function 2: Transfer export file to S3
- Trigger: New file in Google Cloud Storage bucket
- Workflow
- Utilising boto read file from Google Cloud Storage
- Transfer file to S3
- Delete or archive file

vinoaj
- 1,600
- 10
- 8
-
additionally you can also use the suggested method [here](https://stackoverflow.com/questions/39329580/exporting-data-from-google-cloud-storage-to-amazon-s3) to copy from gcs to s3 – Louis C Nov 28 '19 at 22:20
-
@vinoaj, thanks for this! I tried following the steps Pub/Sub, and got stuck at cloud function [1] exporting the bigquery table in GCS.But i get an error - updated in the question – Justine Nov 29 '19 at 15:37
-
@Justine that looks like it might be a syntax error. Are you able to provide your whole code to debug? – vinoaj Nov 29 '19 at 20:17
-
@Justine - re-looking at the error message, I suspect what you have is a multiline query statement. In that case your Python syntax should be (i.e. using triple quotation marks): QUERY="""......""" – vinoaj Nov 29 '19 at 20:48
-
@vinoaj, thank you, that helps, but I'm getting another error. I have updated my question above to include the code. Do i need to do additional setup? – Justine Nov 30 '19 at 21:19
-
@Justine that error message looks like you haven't installed the Google BigQuery python library on your system. Try: pip install --upgrade google-cloud-bigquery – vinoaj Dec 01 '19 at 00:56
-
@vinoaj, I managed to export successfully in GCS, however, I wanted to adjust the filename with date format. Since i'm taking the load from Google Anaytics data in BQ, i want to take data from yesterday, For example ga_session_date_yesterday -> ga_session_20191208? and if I wanted to export it different extension like .gz, will how does it work? Thanks a lot for your help. – Justine Dec 09 '19 at 10:42
-
I solved the date issue, but i'm having issue exporting the file into a readable csv with .gz compression, updated query above – Justine Dec 09 '19 at 16:40