0

Step 1. I create a hive external table "test" and give some location say '/user/hive2'.
Step 2. I load a file in "test" table.
Step 3. I do a select * from test. I get my records.
Step 4. I drop the table by drop table test.
Step 5. I do select * from test and get table not found.

What is the significance of using external table then? If the metadata is deleted of "test" table and I cant query anything on it, then why do we use external table?

earl
  • 738
  • 1
  • 17
  • 38

5 Answers5

2

@Nishant : you should do a little homework before asking generic questions in this forum. Anyways, here's a quick link for detailed explanation on hive external tables in general - External Hive Tables

When you drop an internal table, it drops the data, and it also drops the metadata. You will no longer have access to the data

When you drop an external table, it only drops the meta data. That means hive is ignorant of that data now. It does not touch the data itself. You can now access the data and perform any operations (if needed). External tables are also preferred when you need a pit stop for your data and then dump the entire data in to a managed table for hive opertaions

Also, here's notes from another stackoverflow thread

Use EXTERNAL tables when:

  1. The data is also used outside of Hive. For example, the data files are read and processed by an existing program that doesn't lock the files.
  2. Data needs to remain in the underlying location even after a DROP TABLE. This can apply if you are pointing multiple schemas (tables or views) at a single data set or if you are iterating through various possible schemas.
  3. You want to use a custom location such as ASV.
  4. Hive should not own data and control settings, dirs, etc., you have another program or process that will do those things.
  5. You are not creating table based on existing table (AS SELECT).

Use INTERNAL tables when:

  1. The data is temporary.
  2. You want Hive to completely manage the lifecycle of the table and data.

You want Hive to completely manage the lifecycle of the table and data.

Hope this is helpful.

Community
  • 1
  • 1
Anil
  • 420
  • 2
  • 16
0

A few things.

  1. You're specifying the location of the data. If you just create a managed table, Hive will decide where it stores the table data, but in an external table, you're able to determine that location.

  2. An external table can point to data that already exists in a specified location. In your example, you're inserting into an empty external table, but one use case is to create the table pointing to a location where data resides. In other words, if the data is already there, there's no need to insert into the table at all.

  3. Most importantly, if you drop the table, the data does not get removed. In a managed table, if you insert data and then drop the table, Hive removes the table definition from the metastore, but ALSO removes the data itself. In an external table, only the metastore reference is removed, and the data remain where you've specified.

economy
  • 4,035
  • 6
  • 29
  • 37
0

Step 4. I drop the table by drop table test.

Step 5. I do select * from test and get table not found.

let us assume you recreated the table test.

Try select * from test. This query returns the data bcoz table test is external table.

The table gets deleted(meta data) but data exists in HDFS.

The same will not happen with internal tables which deletes the data from HDFS as well.

yoga
  • 1,929
  • 2
  • 15
  • 18
0

What is the significance of using external table then? If the metadata is deleted of "test" table and I cant query anything on it, then why do we use external table?

when you drop an external table, the underlying data files stay intact. If you want to query the table, don't drop metadata. How can you query a table by dropping metadata?

Unlike internal table, where the data is temporary and getting deleted after creation of table, External table still keeps data at the source directory. Have a look at related question

Even after deleting the metadata, still you can access data file at source directory.

Community
  • 1
  • 1
Ravindra babu
  • 37,698
  • 11
  • 250
  • 211
-1
  1. If you want to access the table by ETL tools then we have to create external tables.
  2. If you want create a table 2 which has same schema as table 1 then external tables are useful. create external table table2 as (select * from table1)
  3. If you want to add new data set to a table by keeping the old data set as it is then external tables are useful.
Usman Maqbool
  • 3,351
  • 10
  • 31
  • 48