1

The support for python Bigquery API indicates that arrays are possible, however, when passing from a pandas dataframe to bigquery there is a pyarrow struct issue.

The only way round it seems its to drop columns then use JSON Normalise for a separate table.

'''from google.cloud import bigquery
 project = 'lake'
 client = bigquery.Client(credentials=credentials, project=project)
 dataset_ref = client.dataset('XXX')
 table_ref = dataset_ref.table('RAW_XXX')
 job_config = bigquery.LoadJobConfig()
 job_config.autodetect = True
 job_config.write_disposition = 'WRITE_TRUNCATE'

 client.load_table_from_dataframe(appended_data, table_ref,job_config=job_config).result()'''

This is the error recieved. NotImplementedError: struct

  • This is due to some limitations in how the parquet serialization works. Tracking this feature request at https://github.com/googleapis/google-cloud-python/issues/8544 – Tim Swast Jul 02 '19 at 17:10
  • @TimSwast Wes Mckinney from Pyarrow has asked for some support to get this feature working. Could I connect you both? – David Draper Jul 04 '19 at 07:19
  • Happy to connect on this. My email is my-last-name at google dot com. – Tim Swast Jul 11 '19 at 13:39

1 Answers1

0

This is currently not supported due to how parquet serialization works.

A feature request to upload pandas DataFrame containing arrays was created at the client library's GitHub:

https://github.com/googleapis/google-cloud-python/issues/8544

Héctor Neri
  • 1,384
  • 9
  • 13