Google BigQuery has no primary key or unique constraints.
We cannot use traditional SQL options such as insert ignore
or insert on duplicate key update
so how do you prevent duplicate records being inserted into Google BigQuery?
If I have to call delete first (based on unique key in my own system) and then insert to prevent duplicate records being inserted into bigquery, wouldn't that that be too inefficient? I would assume that insert is the cheapest operation, no query, just append data. For each insert if I have to call delete, it will be too inefficient and cost us extra money.
What is your advice and suggestions based on your experience?
It would be nice that bigquery has primary key, but it might be conflict with the algorithms/data structure that bigquery is based on?