I am trying to read a table from bigquery:
from google.cloud import bigquery
import os
import pandas as pd
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/file.json'
client = bigquery.Client()
QUERY = """
SELECT *
FROM `project.dataset.table`
limit 10
"""
query_job = client.query(QUERY)
df = query_job.to_dataframe()
----> 9 df = query_job.to_dataframe() ValueError: The pyarrow library is not installed, please install pyarrow to use the to_arrow() function.
But I am able to run:
import pyarrow
dir(pyarrow)
['Array',
'ArrayValue',
'ArrowCapacityError'
...
]
And when I run this:
query_job = client.query(
"""
SELECT *
FROM `project.dataset.table`
limit 10
"""
)
results = query_job.result()
pd.DataFrame(results)
I get:
0
0 (1732982, 4733, 4733, 1609674770000)
1 (1732982, 4733, 4733, 1609674771000)
2 (1732982, 4733, 4733, 1609674795000)
3 (1732982, 4733, 4733, 1609674977000)
4 (1732982, 4733, 4733, 1609675025000)
5 (1732982, 4733, 4733, 1609676040000)
6 (1732982, 4733, 4733, 1609676041000)
7 (1732982, 4733, 4733, 1609677347000)
8 (1732982, 4733, 4733, 1609677351000)
9 (1732982, 4733, 4733, 1609677781000)
Instead of :
row col_1 col_2 col_3 col_4
0 1732982 4733 4733 1609674770000
1 1732982 4733 4733 1609674771000
2 1732982 4733 4733 1609674795000
3 1732982 4733 4733 1609674977000
4 1732982 4733 4733 1609675025000
5 1732982 4733 4733 1609676040000
6 1732982 4733 4733 1609676041000
7 1732982 4733 4733 1609677347000
8 1732982 4733 4733 1609677351000
9 1732982 4733 4733 1609677781000
I've read this and this both neither solved my problem.
End result is to get a DataFrame
from big query after querying it.
EDIT
pyarrow 3.0.0 pypi_0 pypi
numpy 1.19.2 py38h456fd55_0