2

I have a huge logs table stored in GoogleCloud storage. Querying it it is becoming more and more expensive, and I have to find a solution, which I think is to partition the table.

I can manage by myself to upload old data from the big table to the new created partitioned one, but I am struggling to modify my existing python code which is loading data daily from the storage to the biguqery table. I want it to provide data already partitioned by different days.

def bq_load(filename):
    bq_load_cmd = 'bq load --skip_leading_rows=1 --source_format=' + format + ' ' + schema + '.' + table + ' ' + google_bucket + '/' + filename
    print bq_load_cmd
    bq_load_cmd_out = commands.getstatusoutput(bq_load_cmd)
    print bq_load_cmd_out
    bq_status = bq_load_cmd_out[0]
    if bq_status == 0:
        print 'Bq load successfull filename : ' + filename
    else:
        print 'Error loading the file in bq : ' + filename
        mail_cmd = 'mail -s " bq load error hasoffer_clicks" '  + recipient + ' < /dev/null'
        mail_cmd = commands.getstatusoutput(mail_cmd)

This is part of the code I am using right now, which is in the crontab and loads csv files into the table in BigQuery on a daily basis. I searched through internet but so far I didn't manage to understand which is the proper solution to my problem. I think the bq load command might be the one to modify, but I am not sure.

Any suggestion? Kind regards, Eugenio

Lucasaudati11
  • 577
  • 1
  • 4
  • 8

2 Answers2

0

Firstly you should migrate from non-partitioned table to partitioned, for this you could look here: https://stackoverflow.com/questions/38993877/migrating-from-non-partitioned-to-partitioned-tables

Secondly to insert data in to particular partition you should use in your code decorator:

table + '$20171107'

If you need more details for inserting into particular partition look here: https://cloud.google.com/bigquery/streaming-data-into-bigquery#bigquery-stream-data-python

And that's all :)

rtbf
  • 1,509
  • 1
  • 16
  • 35
0

Is there any way to convert from non-partitioned to partitioned without using a query since it might be quite expensive for the huge amount of data?

praskovy
  • 1
  • 2