I'm following this data prediction using Cloud ML Engine with scikit-learn tutorial for GCP AI Platforms. I tried to make an API call to BigQuery with:
def query_to_dataframe(query):
import pandas as pd
import pkgutil
privatekey = pkgutil.get_data('trainer', 'privatekey.json')
print(privatekey[:200])
return pd.read_gbq(query,
project_id=PROJECT,
dialect='standard',
private_key=privatekey)
but got the following error:
Traceback (most recent call last):
[...]
TypeError: a bytes-like object is required, not 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/.local/lib/python3.7/site-packages/trainer/task.py", line 66, in <module>
arguments['numTrees']
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 119, in train_and_evaluate
train_df, eval_df = create_dataframes(frac)
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 95, in create_dataframes
train_df = query_to_dataframe(train_query)
File "/root/.local/lib/python3.7/site-packages/trainer/model.py", line 82, in query_to_dataframe
private_key=privatekey)
File "/usr/local/lib/python3.7/dist-packages/pandas/io/gbq.py", line 149, in read_gbq
credentials=credentials, verbose=verbose, private_key=private_key)
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 846, in read_gbq
dialect=dialect, auth_local_webserver=auth_local_webserver)
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 184, in __init__
self.credentials = self.get_credentials()
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 193, in get_credentials
return self.get_service_account_credentials()
File "/root/.local/lib/python3.7/site-packages/pandas_gbq/gbq.py", line 413, in get_service_account_credentials
"Private key is missing or invalid. It should be service "
pandas_gbq.gbq.InvalidPrivateKeyFormat: Private key is missing or invalid. It should be service account private key JSON (file path or string contents) with at least two keys: 'client_email' and 'private_key'. Can be obtained from: https://console.developers.google.com/permissions/serviceaccounts
When the package runs in local environment, the private key loads fine, but when submitted as a ml-engine training job, the error occurs. Note that the private key fails to load only when I use GCP RUNTIME_VERSION="1.15" and PYTHON_VERSION="3.7", but can load with no problem when I use PYTHON_VERSION="2.7".
In case it's useful, the structure of my package is:
/babyweight
- setup.py
- trainer
- __init__.py
- model.py
- privatekey.json
- task.py
I'm not sure if the problem is due to a bug in Python, or where I placed privatekey.json
.