3

I've used the sample from the BQ documentation to read a BQ table into a pandas dataframe using this query:

query_string = """
SELECT
CONCAT(
    'https://stackoverflow.com/questions/',
    CAST(id as STRING)) as url,
view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
"""

dataframe = (
    bqclient.query(query_string)
    .result()
    .to_dataframe(bqstorage_client=bqstorageclient)
)
print(dataframe.head())

                                            url  view_count
0  https://stackoverflow.com/questions/22879669       48540
1  https://stackoverflow.com/questions/13530967       45778
2  https://stackoverflow.com/questions/35159967       40458
3  https://stackoverflow.com/questions/10604135       39739
4  https://stackoverflow.com/questions/16609219       34479

However, the minute I try and use any other non-public data-set, I get the following error:

google.api_core.exceptions.FailedPrecondition: 400 there was an error creating the session: the table has a storage format that is not supported

Is there some setting I need to set in my table so that it can work with the BQ Storage API?

This works:

query_string = """SELECT funding_round_type, count(*) FROM `datadocs-py.datadocs.investments` GROUP BY funding_round_type order by 2 desc LIMIT 2""" 
>>> bqclient.query(query_string).result().to_dataframe()

funding_round_type     f0_
0            venture  104157
1               seed   43747

However, when I set it to use the bqstorageclient I get that error:

>>> bqclient.query(query_string).result().to_dataframe(bqstorage_client=bqstorageclient)

Traceback (most recent call last):
  File "/Users/david/Desktop/V/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/Users/david/Desktop/V/lib/python3.6/site-packages/grpc/_channel.py", line 533, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Users/david/Desktop/V/lib/python3.6/site-packages/grpc/_channel.py", line 467, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
    status = StatusCode.FAILED_PRECONDITION
    details = "there was an error creating the session: the table has a storage format that is not supported"
    debug_error_string = "{"created":"@1565047973.444089000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1017,"grpc_message":"there was an error creating the session: the table has a storage format that is not supported","grpc_status":9}"
>
David542
  • 104,438
  • 178
  • 489
  • 842
  • Just a hunch, but maybe you're hitting this limitation? _"During the beta period, the BigQuery Storage API is accessible only in the US and EU multi-region locations."_ (https://cloud.google.com/bigquery/docs/reference/storage/) – Graham Polley Aug 06 '19 at 09:31
  • @GrahamPolley no it's US. All my datasets are US – David542 Aug 06 '19 at 18:45
  • @GrahamPolley honestly my guess is it's a permissions issue based on trying to start the `bqstorage` read session, and it gives an erroneous error message, but just a guess...I wonder if anyone else has had this? – David542 Aug 06 '19 at 21:25
  • and your datasets are definitely US multi-region, not single-region? If so, I'm stumped. I don't have a lot of experience with the storage API. It's quite new. Hopefully some of the BQ engineers can chime in. – Graham Polley Aug 06 '19 at 21:44

1 Answers1

1

I experienced the same issue as of 06 Nov 2019 and it turns out that the error that you are getting is a known issue with the Read API as it cannot currently handle result sets smaller than 10MB. I came across this that shed some light on this problem: GitHub.com - GoogleCloudPlatform/spark-bigquery-connector - FAILED_PRECONDITION: there was an error creating the session: the table has a storage format that is not supported #46

I have tested it with a query that returns a larger than 10MB result set and it seems to be working fine for me with an EU multi-regional location of the dataset that I am querying against.

Also, you will need to install fastavro in your environment for this functionality to work.

jayBana
  • 415
  • 4
  • 9