I'm new to my role and part of it requires creating/inserting data into both managed and external hive tables. We have a few lines of 'set' parameters that we run at the beginning of a hive session, but I've run into a few cases, where, for example, the files are merged for some partitions (few number of files), but not others (many smaller files), seemingly on random days.
My question is: when is it necessary to enter all of my Hive set parameters? Does it need to be done for every single insert/command/statement I'm running? Or just once at the beginning of the Hive session when I've launched Hive?
These are the standard set parameters we've been using:
SET mapred.job.queue.name=yometrics;
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.max.dynamic.partitions=2000;
SET hive.exec.max.dynamic.partitions.pernode=2000;
SET hive.merge.tezfiles=true;