0

Can anyone help with explanation of how to use Python code to join the tables in different projects in BigQuery and having different service accounts and separate keys for each project?

import logging
from google.cloud import bigquery

def get_data_from_bigquery():

    creds = 'key.json'
    client = bigquery.Client.from_service_account_json(creds) #Project 1 keys
    logging.info('Starting  BQ data fetch :')

    testquery = \
        """
        INSERT INTO `Project1.Dataset.Table` 
        SELECT DISTINCT o.VName,o.RName,a.Id,a.Name FROM  `Project1.Dataset.Table` o 
        INNER JOIN `Project2.Dataset.Table` a ON o.Id = a.Id  """

    job_config = bigquery.QueryJobConfig()

    testquery_query_job = client.query(testquery, job_config=job_config)

    testquery_query_job.result()
    return True
Matthias Fripp
  • 17,670
  • 5
  • 28
  • 45
mvsz
  • 15
  • 1
  • 10

1 Answers1

1

technically you can do this and it is independent on python.

Firs of all you need to give access on of this service accounts to another project and then run query from perspective of this service account which has access to both projects/resources.

To add access in Project2 to service account from creds = 'key.json' you could go to https://console.cloud.google.com/iam-admin/iam/

Then you don't need do any magic in your python code as authorization is made under the hood by BigQuery

------ update ---- as you are not allowed to do this on BigQuery level which would be recommended because of performance benefits you could achieve this with pandas library. So you need to run two separate queries and join (merge operation) the result within you python code and then insert result into BigQuery.

Merging example here: https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/CPB100/lab4a/demandforecast.ipynb look by "merge"

Inserting here: Write a Pandas DataFrame to Google Cloud Storage or BigQuery look by "table.insert"

rtbf
  • 1,509
  • 1
  • 16
  • 35