3

It's been a while since I've had a chance to work on the pandas GBQ module, but I noticed that one of our regression tests is now failing.

The test in question is:

https://github.com/pydata/pandas/blob/master/pandas/io/tests/test_gbq.py#L254-L267

In short, the test attempts to create a table with 5 columns (types are Boolean, Float, String, Integer, Timestamp) and 1,000,001 rows each. Inserting these rows in chunks of 10,000 rows is failing with a response of "Request Too Large".

I feel like this is going to probably have a similar answer to Getting "Query too large" in BigQuery - but seeing as how this test was working at a previous time, I'm wondering if there's a backend problem that needs to be addressed. It's also possible the API was changed when I wasn't looking!

TLDR Version: What about our insertion is too large, and are there documented limits that we can reference?

Community
  • 1
  • 1
Jacob Schaer
  • 727
  • 1
  • 9
  • 14

1 Answers1

4

The documented limits are here:

https://cloud.google.com/bigquery/streaming-data-into-bigquery#quota

The TL;DR answer: While BQ is not strictly enforcing the max rows per request of 500 rows/insert at this time, there are some other limits elsewhere in the API stack related to the overall request size that are preventing the call from succeeding.

shollyman
  • 4,216
  • 19
  • 17
  • Yep - Pandas uses the streaming upload with Big Query API, and even mentions you can get errors based on size: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.gbq.to_gbq.html?highlight=gbq#pandas.io.gbq.to_gbq – Jabberwockey Jan 07 '15 at 18:28
  • We knew about the limits when writing the original GBQ library and the corresponding docs. It just seemed weird that a test that was previously passing would suddenly have issues. I'm particularly concerned about this 500 rows/insert at a time. Interestingly, upon trying again today it worked (for the first time in a good month at least). I wonder if something was corrected. – Jacob Schaer Jan 08 '15 at 05:15