I'm doing some automatic script of few queries in hive and we found that we need time to time clear the data from a table and insert the new one. And we are thinking what could be faster?
INSERT OVERWRITE TABLE SOME_TABLE
SELECT * FROM OTHER_TABLE;
or is faster to do like this:
DROP TABLE SOME_TABLE;
CREATE TABLE SOME_TABLE (STUFFS);
INSERT INTO TABLE
SELECT * FROM OTHER_TABLE;
The overhead of running the queries is not an issue. Due to we have the script o creation too. The question is, the INSERT OVERWRITE
with billion of rows is faster than DROP + CREATE + INSERT INTO
?