Hive - Can one extract common options for reuse in other scripts?

Question

I have two Hive scripts which look like this:

Script A:

SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=non-strict;
SET hive.exec.parallel=true;

... do something ...

Script B:

  SET hive.exec.dynamic.partition=true;
  SET hive.exec.dynamic.partition.mode=non-strict;
  SET hive.exec.parallel=true;

... do something else ...

The options that we set at the beginning of each script are the same. Is it possible somehow to extract them out to a common place (for example, into a commonoptions.sql) so that our scripts look like this:

Script A:

 <run commonoptions.sql>

... do something ...

Script B:

 <run commonoptions.sql>

... do something else ...

Ideally I would like to extract out table defintions as well, so that I have:

Script A:

 <run commonoptions.sql>
 <run defineExternalTableXYZ.sql>
... do something with Table XYZ ...

Script B:

 <run commonoptions.sql>
 <run defineExternalTableXYZ.sql>
... do something else with Table XYZ ...

That way I can manage the TableXYZ definition at a single spot. I am not using the Hive CLI. I am using Amazon EMR with Hive Steps.

score 2 · Answer 1 · answered Nov 24 '16 at 10:15

2

You can store these configuration parameters in common file and load in each of your scripts using source command:

source /tmp/common_init.hql;

Also you can generate this file for each workflow from the database.

answered Nov 24 '16 at 10:15

leftjoin

36,950
8
57
116

I clarified the question. I am not using the CLI, so source is not an option for me. – FirstName LastName Nov 28 '16 at 19:47
You have your scripts stored in S3 right? In this case you can use the `cat` command to concatenate your scripts with common parameters and then execute concatenated script. – leftjoin Nov 28 '16 at 20:16
So you mean to say I should run a cat command to combine the two files and then upload/sync the files via aws cli before running the hive job? That can actually work. So something like this: cat commonOptions.sql script1_base.sql > script1.sql; aws s3 sync script1.sql ; . Did I get the gist of your solution correctly? – FirstName LastName Nov 28 '16 at 22:35
Yes I mean the same. Remember about eventual consistency in S3... If you will create each time unique file name (using date as suffix) - it's safer than re-creating the same file. I'm not sure, maybe eventual consistency is not an issue for your case – leftjoin Nov 29 '16 at 08:38

score 0 · Answer 2 · answered Nov 04 '17 at 15:22

You should be able to use hive -i config.hql -f script_A.hql, where config.hql would contain your dynamic settings. The -i flag allows you to pass an initialization script that will be executed before the actual hive file passed to -f. I'm not super familiar with how AWS kicks off hive jobs in steps, but presumably you edit the submission arguments.

Hive - Can one extract common options for reuse in other scripts?

2 Answers2

Linked