6
import pandas as pd
from google.cloud import bigquery
import google.auth
# from google.cloud import bigquery

# Create credentials with Drive & BigQuery API scopes
# Both APIs must be enabled for your project before running this code
credentials, project = google.auth.default(scopes=[
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/spreadsheets',
    'https://www.googleapis.com/auth/bigquery',
])
client = bigquery.Client(credentials=credentials, project=project)

# Configure the external data source and query job
external_config = bigquery.ExternalConfig('GOOGLE_SHEETS')
# Use a shareable link or grant viewing access to the email address you
# used to authenticate with BigQuery (this example Sheet is public)
sheet_url = (
    'https://docs.google.com/spreadsheets'
    '/d/1uknEkew2C3nh1JQgrNKjj3Lc45hvYI2EjVCcFRligl4/edit?usp=sharing')
external_config.source_uris = [sheet_url]
external_config.schema = [
    bigquery.SchemaField('name', 'STRING'),
    bigquery.SchemaField('post_abbr', 'STRING')
]
external_config.options.skip_leading_rows = 1  # optionally skip header row
table_id = 'BambooHRActiveRoster'
job_config = bigquery.QueryJobConfig()
job_config.table_definitions = {table_id: external_config}

# Get Top 10
sql = 'SELECT * FROM workforce.BambooHRActiveRoster LIMIT 10'   
    
query_job = client.query(sql, job_config=job_config)  # API request

top10 = list(query_job)  # Waits for query to finish
print('There are {} states with names starting with W.'.format(
    len(top10)))

The error I get is:

BadRequest: 400 Error while reading table: workforce.BambooHRActiveRoster, error message: Failed to read the spreadsheet. Errors: No OAuth token with Google Drive scope was found.

I can pull data in from a BigQuery table created from CSV upload, but when I have a BigQuery table created from a linked Google Sheet, I continue to receive this error.

I have tried to replicate the sample in Google's documentation (Creating and querying a temporary table):

https://cloud.google.com/bigquery/external-data-drive

davidmesalpz
  • 133
  • 6
AME
  • 5,234
  • 23
  • 71
  • 81

4 Answers4

4

You are authenticating as yourself, which is generally fine for BQ if you have the correct permissions. Using tables linked to Google Sheets often requires a service account. Create one (or have your BI/IT team create one), and then you will have to share the underlying Google Sheet with the service account. Finally, you will need to modify your python script to use the service account credentials and not your own.

The quick way around this is to use the BQ interface, select * from the Sheets-linked table, and save the results to a new table, and query that new table directly in your python script. This works well if this is a one-time upload/analysis. If the data in the sheets will be changing consistently and you will need to routinely query the data, this is not a long-term solution.

rtenha
  • 3,349
  • 1
  • 6
  • 19
  • Thanks @rtenha. Can you please refer me to a tutorial on creating a service account and link the underlying google sheet? – AME Jul 10 '19 at 19:03
  • I think Juve's answer here (https://stackoverflow.com/questions/40391128/how-do-i-authenticate-a-service-account-to-make-queries-against-a-gdrive-sheet-b) will be a good starting point. Otherwise I would check out the first few google results here (https://www.google.com/search?q=bigquery+service+account+google+sheet). – rtenha Jul 10 '19 at 19:17
3

I solved problem by adding scope object to client.

from google.cloud import bigquery
import google.auth

credentials, project = google.auth.default(scopes=[
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/bigquery',
])
CLIENT = bigquery.Client(project='project', credentials=credentials)

https://cloud.google.com/bigquery/external-data-drive

Damon Liao
  • 96
  • 2
1
import pandas as pd
from google.oauth2 import service_account
from google.cloud import bigquery
#from oauth2client.service_account import ServiceAccountCredentials

SCOPES = ['https://www.googleapis.com/auth/drive','https://www.googleapis.com/auth/bigquery']
SERVICE_ACCOUNT_FILE = 'mykey.json'

credentials = service_account.Credentials.from_service_account_file(
        SERVICE_ACCOUNT_FILE, scopes=SCOPES)

delegated_credentials = credentials.with_subject('myserviceaccountt@domain.iam.gserviceaccount.com')

client = bigquery.Client(credentials=delegated_credentials, project=project)

sql =  'SELECT * FROM `myModel`'

DF = client.query(sql).to_dataframe()
AME
  • 5,234
  • 23
  • 71
  • 81
  • For some reason, upgrading from python2 to python3 on appengine, I needed to use service account files to make it work again. Your example worked for me too. Thanks! I just edited out the `delegated_credentials` part as I created a service account file directly from the default account. – emerix Apr 22 '20 at 12:32
1

You can try to update your default credentials through the console:

gcloud auth application-default login --scopes=https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/cloud-platform