I have a Google Cloud Storage bucket where a legacy system drops NEW_LINE_DELIMITED_JSON files that need to be loaded into BigQuery.
I wrote a Google Cloud Function that takes the JSON file and loads it up to BigQuery. The function works fine with sample JSON files - the problem is the legacy system is generating a JSON with a non-standard key:
{
"id": 12345,
"@address": "XXXXXX"
...
}
Of course the "@address" key throws everything off and the cloud function errors out ...
Is there any option to "ignore" the JSON fields that have non-standard keys? Or to provide a mapping and ignore any JSON field that is not in the map? I looked around to see if I could deactivate the autodetect and provide my own mapping, but the online documentation does not cover this situation.
I am contemplating the option of:
- Loading the file in memory into a string var
- Replace @address with address
- Convert the json new line delimited to a list of dictionaries
- Use bigquery stream insert to insert the rows in BQ
But I'm afraid this will take a lot longer, the file size may exceed the max 2Gb for functions, deal with unicode when loading file in a variable, etc. etc. etc.
What other options do I have?
And no, I cannot modify the legacy system to rename the "@address" field :(
Thanks!