How to create a date partitioned table while using a loadjob in google Bigquery?

Question

Would someone be able to explain how to create date partitioned table while using a loadjob in google Bigquery using JobConfig.

https://cloud.google.com/bigquery/docs/creating-column-partitions#creating_a_partitioned_table_when_loading_data

I couldnt understand the documentation, if someone could explain with an example it would be very helpful.

Edited: So I thought I figured out the object thanks to @irvifa, but I am still not able to create a TimePartitioned Table, here is the code am trying to use.

import pandas
from google.cloud import bigquery


def load_df(self, df):
  project_id="ProjectID"
  dataset_id="Dataset"
  table_id="TableName"
  table_ref=project_id+"."+dataset_id+"."+table_id
  time_partitioning = bigquery.table.TimePartitioning(field="PartitionColumn")
  job_config = bigquery.LoadJobConfig(
                         schema="Schema",
                         destinationTable=table_ref
                         write_disposition="WRITE_TRUNCATE",
                         timePartitioning=time_partitioning
                         )
  Job = Client.load_table_from_dataframe(df, table_ref, 
                                         job_config=job_config)
  Job.result()

score 6 · Accepted Answer · answered Apr 10 '20 at 02:30

6

I don't know whether it will help, but you can use the following sample to load job with partition:

from datetime import datetime, time
from concurrent import futures
import math
from pathlib import Path
from google.cloud import bigquery

def run_query(self, query_job_config):
  time_partitioning = bigquery.table.TimePartitioning(field="partition_date")
  job_config = bigquery.QueryJobConfig()
  job_config.destination = query_job_config['destination_dataset_table']
  job_config.time_partitioning = time_partitioning
  job_config.use_legacy_sql = False
  job_config.allow_large_results = True
  job_config.write_disposition = 'WRITE_APPEND'
  sql = query_job_config['sql']
  query_job = self.client.query(sql, job_config=job_config)
  query_job.result()

answered Apr 10 '20 at 02:30

irvifa

1,865
2
16
20

Oh my friend! You did help me out. Awesome! Thank you, I see what you did there. I didnt know how to create a Timepartitioning Object. time_partitioning = bigquery.table.TimePartitioning(field="partition_date") That does it and i used that in my LoadJobConfig for timePartitioning attribute. Worked :) – Kishan Kumar Apr 10 '20 at 23:01
Hi @KishanKumar glad to help you, can you please mark this as the answer? Thanks! – irvifa Apr 11 '20 at 10:04
@irvifa I thought I got the output, but its still created an non partitioned table, I have updated the question, if you think you can tell me what I am doing wrong now, I will mark it closed. – Kishan Kumar Apr 11 '20 at 21:57
1

I think it should be `time_partitioning` not timePartitioning for the destination table I think it should be destination_table_description. Please refer to https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.LoadJobConfig.html#google.cloud.bigquery.job.LoadJobConfig – irvifa Apr 12 '20 at 07:34
duh! You are right, I used timePartitioning looking at https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#jobconfigurationload. My bad, Thank you again. – Kishan Kumar Apr 12 '20 at 18:42

Kishan Kumar · Answer 2 · 2020-04-12T18:43:13.420

Thanks to irvifa.

I was trying to load dataframe and was looking for LoadJobConfig, but it was very similar.

I would post my answer just in case someone needs any example for LoadJob.

import pandas
from google.cloud import bigquery


def load_df(self, df):
  project_id="ProjectID"
  dataset_id="Dataset"
  table_id="TableName"
  table_ref=project_id+"."+dataset_id+"."+table_id
  time_partitioning = bigquery.table.TimePartitioning(field="PartitionColumn")
  job_config = bigquery.LoadJobConfig(
                         schema="Schema",
                         write_disposition="WRITE_TRUNCATE",
                         time_partitioning=time_partitioning
                         )
  Job = Client.load_table_from_dataframe(df, table_ref, 
                                         job_config=job_config)
  Job.result()

How to create a date partitioned table while using a loadjob in google Bigquery?

2 Answers2