We have the following three JSONs with data that should be loaded in the same table:
{ "name": "tom", "customValues": [] }
{ "name": "joe", "customValues": { "member": "1" } }
{ "name": "joe", "customValues": { "year": "2020", "number": "3" } }
We load data with the python bigquery.LoadJobConfig
function:
job_config = bigquery.LoadJobConfig(
schema=SCHEMA_MAP.get(bq_table) if autodetect == False else None,
source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,
write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE if remove_old == True else bigquery.WriteDisposition.WRITE_APPEND,
autodetect=autodetect
)
SCHEMA_MAP
is a dictionary of arrays, where each array in the schema for one of our tables. We define our BigQuery schema in python using the python bigquery.SchemaField
function. If each of the 3 JSONs above were going into 3 different tables, I would have their table schemas defined as:
SCHEMA_T1 = [
bigquery.SchemaField("name", "STRING"),
bigquery.SchemaField("customValues", "STRING", mode="REPEATED")
]
SCHEMA_T2 = [
bigquery.SchemaField("name", "STRING"),
bigquery.SchemaField("customValues", "RECORD", mode="REPEATED", fields=[
bigquery.SchemaField("member", "STRING")
])
]
SCHEMA_T3 = [
bigquery.SchemaField("name", "STRING"),
bigquery.SchemaField("customValues", "RECORD", mode="REPEATED", fields=[
bigquery.SchemaField("year", "STRING"),
bigquery.SchemaField("number", "STRING")
])
]
Is it possible to define the customValues
column to handle all 3 of these different data types in one single table? How would the schema be defined for this? Currently, if SCHEMA_T1
is used and data in the forms of T2 or T3 is uploaded, the upload fails and it returns an error Error while reading data, error message: JSON parsing error in row starting at position 0: JSON object specified for non-record field: customValues
. Similar errors for the other schemas. Is there a generic any json field in BigQuery that can be used for this?