Currently testing a cluster and when using the "CREATE TABLE AS"
the resulting managed table ends up being one file ~ 1.2 GB while the base file the query is created from has many small files. The SELECT portion runs fast, but then the result is 2 reducers running to create one file which takes 75% of the run time.
Additional testing:
1) If using "CREATE EXTERNAL TABLE AS"
is used the query runs very fast and there is no merge files step involved.
2) Also, the merging doesn't appear to occur with version HDP 3.0.1.