no_drop
just restricts dropping the table, but there are scenarios when you may need to drop/create the table keeping the data.
You may need to change table definition and recreate it without deleting the data. For example you have some JSON files loaded by upstream process, upstream process added one more attribute to the JSON and you want to re-create your table definition to include new column and have all the data as is. Or add more columns which are being parsed from semi-structured files (Using RegexSerDe
), or just change regex in table DDL to improve extraction.
Scenario 2: You are loading data into partition folders daily. and some downstream process needs only latest partition, and you can not pass the parameter to it, while some other process wants the whole table. In this case you can create partitioned table and not partitioned latest one on top of the latest partition dir in the upstream process, dropping and re-creating each time and changing location. Downstream process can use latest table (whithout knowing in advance which is the latest partition and no need in finding it) and the whole partitioned one also can be used.
You may want to load data into location outside configured managed tables location root dir, see https://stackoverflow.com/a/67073849/2700344 And there are policies in your warehouse restricting loading raw data into managed tables dir. On Cloudera distribution you can not create managed table and specify location other than default one. This is why you may need EXTERNAL table.
Tables which are using JDBC storage handler
, or Kafka storage handler, HBASE can be created as External only.
Of course in some cases you can use ALTER TABLE instead of DROP/CREATE but it is convenient to store table DDL file, versioned in the repo without ALTERs.
And maybe there are some other possible use-cases.