I have a dataset at BigQuery with 100 thousand+ rows and 10 columns. I'm also continuously adding new data to the dataset. I want to fetch data that not processed, process them and write back to my table. Currently, I'm fetching them to a pandas dataframe using bigquery
python library and processing using pandas.
Now, I want to update table with new pre-processed data. One way of doing it using SQL statement and calling query
function of the bigquery.Client()
class. Or use a job like here.
bqclient = bigquery.Client(
credentials=credentials,
project=project_id,
)
query = """UPDATE `dataset.table` SET field_1 = '3' WHERE field_2 = '1'"""
bqclient.query(query_string)
But it doesn't make sense to create update statement for each row.
Another way I found is using to_gbq function of pandas-gbq package. Disadvantage of this , it updates all table.
Question: What is the best way of updating Bigquery table from pandas dataframe?