I am running a simple query like the one shown below(similar form)
INSERT OVERWRITE table TABLE2
PARTITION(COLUMN)
SELECT *
FROM TABLE1
There is nothing wrong with query syntax wise.
TABLE2 IS EMPTY and the total size of TABLE1 is 2gb in HDFS(stored as parquet with snappy compression)
When I run the query in hive, I see that 17 map tasks and 0 reducer tasks are launched.
What I notice is that most of the map task complete in a minute. But one of the map task takes long time. It's like all the data in the table is going to that map task.
The whole query fails eventually with container physical memory limit error.
Any reasons for why this is happening or might happen?