0

I'm trying to materialize hive table based on file that are stored as parquet in GCS, with path like gs://abc/dt=02-02-2019/hr=02(physical partition based on UTC)

Now I want to create two hive table where the logical partition is based on timezone, say one for UTC and other for CET, how can I partition such that date and hour based partition picks the dt and hr value based on timezone. Also it would be great if it can also accommodate for day-light saving etc.

I am using airflow to create external hive table.

Sandeep
  • 131
  • 1
  • 9
  • It is not clear about logical partition. Could you please provide some example – leftjoin Feb 14 '19 at 13:54
  • Logical partitions are nothing but the partition of my hive table, It is physically stored gcs with partition based on dt and hr (the date and hour is based on UTC) , now I just want to create a hive table on top of this but the hive partition(dt and hr) should be based on a different time zone, eg CET based – Sandeep Feb 14 '19 at 14:20
  • And timezone is existing column or can be derived from existing column? – leftjoin Feb 14 '19 at 14:32
  • i doubt your question has much to do with `Airflow`. You'll have to [convert](https://stackoverflow.com/a/13287083/3679900) the [`execution_date`](https://stackoverflow.com/a/47543448/3679900) (i think presently it is only in `UTC`) of your `task` into `CET` / any other timezone and then pass that to your `Hive` `DML` / `DDL` (`HQL`) statement – y2k-shubham Feb 14 '19 at 17:31

1 Answers1

0

there is ablog that explains this well https://medium.com/udemy-engineering/supporting-multiple-time-zones-on-hive-with-single-data-source-b884cba46451

The basic idea is to have data stored as utc time. And partitioned by utc hour. That way we can have two hive tables. One hive table points as is, which is utc. But for lets a PT hive table, you would point 18th hour to UTC 11th hour, so there is mapping conversion happening at each.

Gaurav Shah
  • 5,223
  • 7
  • 43
  • 71