I have two tables with schema
table: run1
id string week_date string metric double
table: run2
id string metric double week_date string statistic int Partition Information col_name data_type comment week_date string statistic int
I would like to group the data in to equal sizes for each week date and then write the contents to a new table which is partitioned based on the week date as well the statistic(Statistic is nothing but the bucket id).
I find that the query partitions results correctly however the contents within a partition are not sorted
Below is the query I am using and the data I am using as well as the output from one of the partitions
Query:
insert overwrite table run2 partition(week_date, statistic) select id,metric, week_date, ntile(3) over (PARTITION BY week_date order by metric) as statistic from run1 distribute by week_date sort by metric desc;
Input:
B0001 2015-01-08 200.0 B0002 2015-01-08 200.0 B0003 2015-01-08 800.0 B0004 2015-01-08 600.0 B0005 2015-01-08 5400.0 B0006 2015-01-08 1100.0 B0007 2015-01-08 100.0 B0008 2015-01-08 300.0
Output of Partition: week_date=2015-01-08/statistic=2
B0003^A800.0 B0008^A300.0 B0004^A600.0
I was expecting the contents to be sorted by the metric value, however it is not. If I do not insert results to another table and just do a simple select I do see that the contents are indeed sorted. Is there something special that needs to be done when performing inserts?