import pyarrow not working <- error is "ValueError: The pyarrow library is not installed, please install pyarrow to use the to_arrow() function."

Question

I have tried installing it in the terminal and in juypter lab and it says that it has been successfully installed but when I run df = query_job.to_dataframe() I keep getting the error " ValueError: The pyarrow library is not installed, please install pyarrow to use the to_arrow() function.". I have no idea how to fix this. Any advice? I am trying to access data from google data studio ultimately with the code,

from google.cloud import bigquery
import pandas
import numpy
import pyarrow
bigquery_client = bigquery.Client()
import os 
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] ='full file path here.json'
QUERY = """
SELECT * 
FROM `warehouse`
LIMIT 100
"""
query_job = bigquery_client.query(QUERY)
df = query_job.to_dataframe()

Hello, can you share your requirements.txt ? – Kimor Dec 15 '20 at 11:23 — Kimor, Dec 15 '20 at 11:23
did you try updating all your packages to latest version? – Yev Guyduy Dec 15 '20 at 16:51 — Yev Guyduy, Dec 15 '20 at 16:51
I am still having this problem as well. – Jonas Palačionis Feb 04 '21 at 13:35 — Jonas Palačionis, Feb 04 '21 at 13:35

score 12 · Answer 1 · edited Dec 03 '21 at 09:21

12

I got the same error message ModuleNotFoundError: No module named 'pyarrow' when testing your Python code. This behavior disappeared after installing the pyarrow dependency with pip install pyarrow.

Edit: It worked for me once I restarted the kernel after running pip install pyarrow

edited Dec 03 '21 at 09:21

Christensen Daniel

3
2

answered Dec 17 '20 at 14:18

juferafo

537
3
12

score 7 · Answer 2 · edited Jan 13 '21 at 23:25

7

I had the same issue. Fixed after the following:

pip install --upgrade 'google-cloud-bigquery[bqstorage,pandas]'

Source: https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas

edited Jan 13 '21 at 23:25

Mohnish

1,010
1
12
20

answered Jan 13 '21 at 09:18

Richard

87
1

Nope. Didnt fix for me. – Tunneller Jan 24 '22 at 05:18

score 3 · Answer 3 · answered Jan 07 '21 at 19:44

3

I had the same issue because I had pyarrow 2.0, however you will need version 1.0.1 . Try running this line: pip install pandas-gbq==0.14.0

answered Jan 07 '21 at 19:44

Utkarsh Goyal

31
2

score 0 · Answer 4 · answered Apr 29 '21 at 12:53

To avoid using fetch_pandas_all(), I have used fetchall, and then converted result to pandas DataFrame I have used:

requirements.txt

snowflake-connector-python==2.4.3
pandas==1.2.4

dag.py

    def execute(self, **kwags):
        """
        :param kwargs: optional parameter. Can be used to provide task input context
        :return: returns query result in json format
        """

        ctx = snowflake.connector.connect(
            user=self.SNOWFLAKE_USER,
            password=self.SNOWFLAKE_PASSWORD,
            account=self.SNOWFLAKE_ACCOUNT
        )
        cs = ctx.cursor()
        try:
            cs.execute(self.sql_query)
            data = cs.fetchall()
            df = pd.DataFrame(data)
            print(f'\nQUERY RESULT: \n' \
                      f' {tabulate(df, headers="keys", tablefmt="psql", showindex="always")} \n')
        finally:
            cs.close()
        ctx.close()
        logging.info("Query executed successfully")
        return json.loads(data)

score 0 · Answer 5 · edited Oct 16 '21 at 05:37

0

I have experienced a similar problem but then I used the pandas Dataframe method:

client = bigquery.Client()
try:
    df = client.query(query)
    df = pd.Dataframe(df)
except ValueError:
    print("google services not available or invalid credentials.")

df.head()

edited Oct 16 '21 at 05:37

Ryan M

18,333
31
67
74

answered Oct 15 '21 at 20:41

Ifeanyi Eze

1

score -2 · Answer 6 · edited Jan 14 '21 at 08:44

-2

just need to install pyarrow using pip

df = client.query(query1).to_dataframe()
 data = df.to_json()
        
 print(data['total_transactions'][0])
 print(data['total_visits'][0])

edited Jan 14 '21 at 08:44

Suraj Rao

29,388
11
94
103

answered Jan 14 '21 at 08:40

Janith Romitha Algewatta

15
1

1

this answer code only includes the copy of the post authors code. it does not show the solution. Please, add the underlying pip command – Tedo G. Apr 05 '21 at 19:46

import pyarrow not working <- error is "ValueError: The pyarrow library is not installed, please install pyarrow to use the to_arrow() function."

6 Answers6

Linked