0

I am streaming data into BigQuery using the Python client library. The row of data lands in the BQ streaming buffer just fine, but when I run a query to view it I can only see the first letter of the value I have inserted.

Specifically, I run a snippet of Python like this:

from google.cloud import bigquery
client = bigquery.Client()
dataset_id = 'mydataset'
table_id = 'mytable'
table_ref = client.dataset(dataset_id).table(table_id)
table = client.get_table(table_ref)
rows_to_insert = [(u'testString')]
client.insert_rows(table, rows_to_insert)

Then when I run SELECT * FROM mytable, the result value I get only has 't' instead of 'testString'

I'm guessing this has something to do with the streaming buffer and should show me the entire value once it has been rewritten in BQ native format. But it would be great if someone could clarify it for me.

Balkan
  • 691
  • 1
  • 8
  • 22
  • This is odd and shouldn't happen. Querying the streaming buffer should return the full value of any column. Is your schema correct? – Graham Polley Aug 05 '19 at 13:06

1 Answers1

1

When you are streaming data in BigQuery, every row is a python tuple type. To define a tuple in python properly, you will need to add one more ,. For example:

>>> type( ('a') )
<type 'str'>

>>> type( ('a',) )
<type 'tuple'>

As it is stated in this Stackoverflow answer.

The way you have it now, it sends an array of individual characters, so every character will get in a different column (in case you have more columns).

Just replace rows_to_insert = [(u'testString')] with rows_to_insert = [(u'testString',)] and your string will be stored properly.

Andrei Cusnir
  • 2,735
  • 1
  • 14
  • 21