1

From this link, I've learnt that there are several differences between internal and external tables related to storage and dropping. Can anyone tell me is there any difference about query efficiency?

Community
  • 1
  • 1
Fihop
  • 3,127
  • 9
  • 42
  • 65
  • We have seen data insertion time difference in some cases for tables with partitions and buckets . Internal tables out performing external tables in these cases. – K S Nidhin Jan 20 '16 at 18:21

3 Answers3

3

There is nothing like internal tables. Hive have managed tables and external tables. There will be no performance difference between the two.

Two differences from DDL perspective:

  1. Hive will have control over files in case of managed table. If you drop the table, then the data in HDFS and metadata in metastore DB will be gone. For external tables only metadata in metastore tables will be gone.
  2. Syntax difference. Specifying location is mandatory for external tables.

From query perspective, no difference at all.

Durga Viswanath Gadiraju
  • 3,896
  • 2
  • 14
  • 21
1

To answer you Question :

For External Tables ,Hive does not move the data into its warehouse directory. If the external table is dropped, then the table metadata is deleted but not the data.

For Internal tables , Hive moves data into its warehouse directory. If the table is dropped, then the table metadata and the data will be deleted. For your reference,

Difference between Internal & External tables :

For External Tables -

    External table stores files on the HDFS server but tables are not linked to the source file completely.

    If you delete an external table the file still remains on the HDFS server.

    As an example if you create an external table called “table_test” in HIVE using HIVE-QL and link the table to file “file”, then deleting “table_test” from HIVE will not delete “file” from HDFS.

    External table files are accessible to anyone who has access to HDFS file structure and therefore security needs to be managed at the HDFS file/folder level.

    Meta data is maintained on master node, and deleting an external table from HIVE only deletes the metadata not the data/file.

For Internal Tables-

    Stored in a directory based on settings in hive.metastore.warehouse.dir, by default internal tables are stored in the following directory “/user/hive/warehouse” you can change it by updating the location in the config file .
    Deleting the table deletes the metadata and data from master-node and HDFS respectively.
    Internal table file security is controlled solely via HIVE. Security needs to be managed within HIVE, probably at the schema level (depends on organization).

Hive may have internal or external tables, this is a choice that affects how data is loaded, controlled, and managed.

Use EXTERNAL tables when:

The data is also used outside of Hive. For example, the data files are read and processed by an existing program that doesn’t lock the files.
Data needs to remain in the underlying location even after a DROP TABLE. This can apply if you are pointing multiple schema (tables or views) at a single data set or if you are iterating through various possible schema.
Hive should not own data and control settings, directories, etc., you may have another program or process that will do those things.
You are not creating table based on existing table (AS SELECT).

Use INTERNAL tables when:

The data is temporary.
You want Hive to completely manage the life-cycle of the table and data.

Source :

HDInsight: Hive Internal and External Tables Intro

Internal & external tables in Hadoop- HIVE

vishal
  • 352
  • 2
  • 12
0

Hive provides us data warehousing facilities on top of an existing Hadoop cluster. Along with that it provides an SQL like interface.

You can create table in two different ways.

  1. Create External table

CREATE EXTERNAL TABLE students (id INT, name STRING, batch STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' #supply delimiter LOCATION '/user/hdfs/students';

For External Tables Hive does not move the data into its warehouse directory. If the external table is dropped, then the table metadata is deleted but not the data.

  1. Create Normal Table

CREATE TABLE students (id INT, name STRING, batch STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' #supply delimiter LOCATION '/user/hddfs/students';

For Normal tables hive moves data into its warehouse directory. If the table is dropped, then the table metadata and the data will be deleted.

You can check this.

USB
  • 6,019
  • 15
  • 62
  • 93