I'm using python pandas to write a DataFrame to parquet in GCS, then using Bigquery Transfer Service to transfer the GCS parquet file to a Bigquery table. Sometimes when the DataFrame is small, an entire column might have NULL values. When this occurs, Bigquery treats that null value column as an INTEGER
type instead of what the parquet claims it to be.
When trying to append it to an existing table that expects that column to be NULLABLE STRING
, Big Query Transfer Service will fail with INVALID_ARGUMENT: Provided Schema does not match Table project.dataset.dataset_health_reports. Field asin has changed type from STRING to INTEGER; JobID: xxx
When I use BQDTS to write the parquet to a new table, it can create the table, but the null column becomes an Integer type.
Any idea how to make BQDTS respect the original type or to manually specify types?