2

When trying to upload a parquet file into BigQuery, I get this error:

Error while reading data, error message: Read less values than expected from: prod-scotty-45ecd3eb-e041-450c-bac8-3360a39b6c36; Actual: 0, Expected: 10 

I don't know why I get the error.

I tried inspecting the file with parquet-tools and it prints the file contents without issues.

The parquet file is written using the parquetjs JavaScript library.

Update: I also filed this in the BigQuery issue tracker here: https://issuetracker.google.com/issues/145797606

Dobes Vandermeer
  • 8,463
  • 5
  • 43
  • 46

2 Answers2

2

It turns out BigQuery doesn't support the latest version of the parquet format. I changed the output not to use the version 2 format and BigQuery accepted it.

Dobes Vandermeer
  • 8,463
  • 5
  • 43
  • 46
  • Did you file a request to get this format supported on the issue tracker? If you haven't, I will. Any sample file? – Felipe Hoffa Dec 08 '19 at 22:22
  • Parquet V2 is not considered production. If ParquetJS is writing this by default you should ask them to change it – Wes McKinney Dec 09 '19 at 12:51
  • @FelipeHoffa I filed https://issuetracker.google.com/issues/145797606 for this issue, and left a comment that parquet v2 was the problem – Dobes Vandermeer Dec 09 '19 at 23:08
  • @WesMcKinney Yeah, that might be a good idea. Ideally BigQuery would have rejected the file with a better error in this case anyway. – Dobes Vandermeer Dec 09 '19 at 23:13
1

From the error message it seems like a rogue line break might be causing this.

We use DataPrep to clean up our data, it works quite well. If I am wrong it's also google recommended method of cleaning up / sanitising data for big query.

https://cloud.google.com/dataprep/docs/html/BigQuery-Data-Type-Conversions_102563896

Parth Mehta
  • 1,869
  • 5
  • 15
  • 1
    I don't have any affiliation with Dataprep or Google Cloud. However I have used the product and am a big fan. I could have worded it a little better so I apologise if it came across like that. Would appreciate a more constructive response / suggestion, am only trying to help here :) – Parth Mehta Dec 09 '19 at 12:56