1

I am trying to read a table from bigquery:

from google.cloud import bigquery
import os
import pandas as pd

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/file.json'
client = bigquery.Client()


QUERY = """
SELECT * 
FROM `project.dataset.table`
limit 10
"""
query_job = client.query(QUERY)
df = query_job.to_dataframe()

----> 9 df = query_job.to_dataframe() ValueError: The pyarrow library is not installed, please install pyarrow to use the to_arrow() function.

But I am able to run:

import pyarrow
dir(pyarrow)

['Array',
 'ArrayValue',
 'ArrowCapacityError'
  ...
]

And when I run this:

query_job = client.query(
    """
    SELECT *
    FROM `project.dataset.table`
    limit 10
    """
)

results = query_job.result()
pd.DataFrame(results)

I get:

                                      0
0   (1732982, 4733, 4733, 1609674770000)
1   (1732982, 4733, 4733, 1609674771000)
2   (1732982, 4733, 4733, 1609674795000)
3   (1732982, 4733, 4733, 1609674977000)
4   (1732982, 4733, 4733, 1609675025000)
5   (1732982, 4733, 4733, 1609676040000)
6   (1732982, 4733, 4733, 1609676041000)
7   (1732982, 4733, 4733, 1609677347000)
8   (1732982, 4733, 4733, 1609677351000)
9   (1732982, 4733, 4733, 1609677781000)

Instead of :

row   col_1  col_2   col_3          col_4                            
0   1732982   4733   4733   1609674770000
1   1732982   4733   4733   1609674771000
2   1732982   4733   4733   1609674795000
3   1732982   4733   4733   1609674977000
4   1732982   4733   4733   1609675025000
5   1732982   4733   4733   1609676040000
6   1732982   4733   4733   1609676041000
7   1732982   4733   4733   1609677347000
8   1732982   4733   4733   1609677351000
9   1732982   4733   4733   1609677781000

I've read this and this both neither solved my problem.

End result is to get a DataFrame from big query after querying it.

EDIT

pyarrow 3.0.0 pypi_0 pypi

numpy 1.19.2 py38h456fd55_0

Jonas Palačionis
  • 4,591
  • 4
  • 22
  • 55
  • What version of pyarrow and numpy do you have installed? Can you show a the output of `pip list` or `conda list`? – joris Feb 04 '21 at 14:09
  • @joris, I've added pyarrow and numpy versions, do you still need the full output of `conda list`? – Jonas Palačionis Feb 04 '21 at 14:24
  • By looking at the github code it looks like the only way to get this error is if the lib fails to `import pyarrow`, you should refresh you env, restart your kernel, ect... to ensure that pyarrow is indeed available – Fabich Feb 04 '21 at 16:34
  • @JonasPalačionis How did you install pyarrow? Was it via a `conda install`? Try uninstalling and reinstalling via `conda-forge` using the `conda install -c conda-forge pyarrow` command – Pace Feb 04 '21 at 18:03
  • @Pace, thanks, I used your suggestions but now I get `TypeError: to_pandas() got an unexpected keyword argument 'timestamp_as_object'`. Any ideas on that? – Jonas Palačionis Feb 05 '21 at 07:53
  • Can you show the full traceback in the question? – joris Feb 05 '21 at 16:33

0 Answers0