I have a huge logs table stored in GoogleCloud storage. Querying it it is becoming more and more expensive, and I have to find a solution, which I think is to partition the table.
I can manage by myself to upload old data from the big table to the new created partitioned one, but I am struggling to modify my existing python code which is loading data daily from the storage to the biguqery table. I want it to provide data already partitioned by different days.
def bq_load(filename):
bq_load_cmd = 'bq load --skip_leading_rows=1 --source_format=' + format + ' ' + schema + '.' + table + ' ' + google_bucket + '/' + filename
print bq_load_cmd
bq_load_cmd_out = commands.getstatusoutput(bq_load_cmd)
print bq_load_cmd_out
bq_status = bq_load_cmd_out[0]
if bq_status == 0:
print 'Bq load successfull filename : ' + filename
else:
print 'Error loading the file in bq : ' + filename
mail_cmd = 'mail -s " bq load error hasoffer_clicks" ' + recipient + ' < /dev/null'
mail_cmd = commands.getstatusoutput(mail_cmd)
This is part of the code I am using right now, which is in the crontab and loads csv files into the table in BigQuery on a daily basis. I searched through internet but so far I didn't manage to understand which is the proper solution to my problem. I think the bq load command might be the one to modify, but I am not sure.
Any suggestion? Kind regards, Eugenio