0

When running the code below, I am receiving a pyarrow error. I have installed pyarrow and I am still getting the same error. I am able to access the table and see the schemas, etc. but to_dataframe() does not work when copying the same code as Google Bigquery documentation.

from google.cloud import bigquery
from google.oauth2 import service_account

key_path = key_path #personal json file
credentials = service_account.Credentials.from_service_account_file(
    key_path, scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
client = bigquery.Client(credentials=credentials, project=credentials.project_id,)
query = """
    select * 
    from `table` 
    limit 10;
"""
df = client.query(query).to_dataframe()  # I have also tried with df = client.query(query).result().to_dataframe()

I am receiving the following error when running:

ValueError: The pyarrow library is not installed, please install pyarrow to use the to_arrow() function.
Jimmy
  • 127
  • 10
  • What happens if you run `import pyarrow` in a python shell? – Pace May 07 '21 at 03:15
  • No issue with the import. Same error message as before for the query. – Jimmy May 07 '21 at 05:58
  • 1
    I tried your code in my environment. Can you try "pip install --upgrade google-cloud-bigquery[pandas]"? This installation added some libraries that might have been missed. And also check if the versions are compatible. In my environment, library versions are like this: **google-cloud-bigquery**, Version: 2.16.0 and **pyarrow**, Version: 3.0.0 . I'm suggesting this because I wasn't able to recreate the issue as your code worked perfectly fine for me. – Kabilan Mohanraj May 07 '21 at 07:14

2 Answers2

0

Try this :
import pandas as pd query_string = """ SELECT * FROM Table limit 10; """
df = pd.read_gbq(query_string,project_id,dialect='standard')

Adam
  • 73
  • 6
  • This is a different library that I am not able to use due to its requiring of authorization and it's not a google-bigquery library. – Jimmy May 07 '21 at 06:02
0

I Fixed this same issue, I also tried below, which did not work.

pip install --upgrade google-cloud-bigquery[pandas]

in the end I just removed all the packages in my virtualenv (actually I just deleted the env folder) then reinstalled them (actually I just made a new virtualenv and installed the packages I needed)

after installing

pip install google-cloud-bigquery

the only extra thing I needed to do was.

pip install google-cloud-bigquery[pandas]

im sure you could just remove google-cloud-biguqery and its dependencies, as a more elegant solution to just straight up deleting the virtualenv and remaking it.

CrookedCloud
  • 116
  • 1
  • 4