I want to do some actions to files on hdfs by using hive temporarily,so i do not want to use internal table.but my data is so huge ,for example 1TB,so I worry about the performance of external table. so I ask the question about difference of performance between table and extenal table in hive.
Asked
Active
Viewed 2,579 times
4
-
1Hope you are looking for difference between Internal table and external table in Hive. Please clarify. – Sandeep Singh Dec 24 '16 at 02:50
-
yes,I got the wrong word "extend".I'm sorry.I search again by using the right word,and get some answer,which is no difference of performance between them.it isn't right? – ElapsedSoul Dec 24 '16 at 03:51
-
Refer this answer of mine: http://stackoverflow.com/a/37192041/2142994 – Ani Menon Dec 24 '16 at 12:27
-
Yes. there is no major difference in performance between both table types. But here you have large data size and you are using hive temporarily then you should use internal table. – Sandeep Singh Dec 24 '16 at 18:26
-
why should I use internal table,when my data is large,if there's no difference between them? – ElapsedSoul Dec 24 '16 at 22:19
-
finally, I think there's no difference of performance between them, the only difference between them is when you drop the table, you will drop your data with using internal table, too. which means if you want to do it both, you use internal table for less work. and if you intend to pursue the performance, you might use ORC table which mentioned by @Ani Menon – ElapsedSoul May 09 '20 at 01:58
2 Answers
0
You may just create hive external tables and use them. I haven't noticed any major difference in performance internal and external tables.
To improve performance you may create ORC(file format) tables which are managed by hive.
Create ORC table:
CREATE TABLE IF NOT EXISTS <orc_table_name>(
<col name> <type>)
COMMENT 'comments'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS ORC;
Then insert into ORC tables:
INSERT OVERWRITE TABLE <orc_table_name> SELECT * FROM <external_table_name>;

Ani Menon
- 27,209
- 16
- 105
- 126
0
Difference between external and internal table performance that i have experienced is
internal tables takes more CPU Time
External tables takes less CPU Time by approximately 40%

swati
- 1