0

I have several Django (python) based, back-end web applications that I would like to start piping data into Google Big Query in an automated fashion. The relational database on the backend is MySQL, these applications are not public facing and not in Google App Engine.

We already have Google Apps for Business along with a Google Big Data project set up. With that said I can manually dump tables into CSV and import into Big Query but is there some best practices on automating this kind of data delivery into Google? I've poured over the documentation and don't really see any definitive writing on this matter.

Any advice would be appreciated.

Thanks for reading

xXPhenom22Xx
  • 1,265
  • 5
  • 29
  • 63

2 Answers2

2

Recently WePay started a series of articles on how they use BigQuery to run their analytics. Their second article highlights how they use Apache AirFlow to move data from MySQL to BigQuery:

As they mention "We have only a single config-driven ETL DAG file. It dynamically generates over 200 DAGs", and "The most important part is the select block. This defines which columns we pull from MySQL and load into BigQuery".

See the article for more details.

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
0

You can use Python robots, which run on Linux with crontab.

For loading into Google Cloud Platform BigQuery, I use pandas_gbq.to_gbq library:

  • Create your dataframe (df) according to this or this
  • In order to get the Token.jsonfile:

    • Create a Google Cloud Platform BigQuery service account.
    • Load the JSON file:

       from google.oauth2 import service_account
       import pandas as pd
       import pandas_gbq
       DIR = os.path.dirname(os.path.realpath(__file__))
       TOKEN_AUTH = DIR + '/token.json'
       CREDENTIALS = service_account.Credentials.from_service_account_file(TOKEN_AUTH)
       #df is a pandas dataframe
       pandas_gbq.to_gbq(df, '<dataset>.<table_name>', project_id='<project_id>',
                        if_exists=<replace or append> , credentials=CREDENTIALS)
      

Once you have created your tocken, install crontab on Linux and schedule your load-robot task:

Finally, you can also use Apache Airflow (for advanced users with Docker skills)

Victor
  • 2,450
  • 2
  • 23
  • 54