I'm reading the literature on comparing Hive and Impala.
Several sources state some version of the following "cold start" line:
It is well known that MapReduce programs take some time before all nodes are running at full capacity. In Hive, every query suffers this “cold start” problem.
In my opinion, it is not sufficient to understand what is meant by "cold start". Looking for more information and clarity to understand this.
For context, I'm a data scientist. I create queries, and have only basic understanding of big data concepts.
I've referred to questions that explain why Impala is faster (example), but they don't explicitly address or define cold start.