I'm a newbie on hive, so a basic question: How do I create a query such that the result of that query is partitioned in a specific way?
For example:
CREATE TABLE IF NOT EXISTS tbl_x (
x SMALLINT,
y FLOAT)
PARTITIONED BY (id SMALLINT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS ORC;
INSERT INTO TABLE `tbl_x`
VALUES (1, 1, 1.0),
(1, 1, 2.0),
(1, 2, 3.0),
(1, 2, 4.0),
(2, 1, 5.0),
(2, 1, 6.0),
(2, 2, 7.0),
(2, 2, 8.0);
CREATE TABLE tbl_y AS SELECT `id`, `x`, SUM(`y`) AS `y_sum`
FROM `tbl_x`
GROUP BY `id`, `x`;
In that example, I'd like tbl_y to be partitioned too.
Trying this doesn't work:
CREATE TABLE tbl_y AS SELECT `id`, `x`, SUM(`y`) AS `y_sum`
FROM `tbl_x`
GROUP BY `id`, `x` PARTITIONED BY (id SMALLINT);
What is the trick here? Should I define the partitioned table first and insert the results in?