3

I have two Hive scripts which look like this:

Script A:

SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=non-strict;
SET hive.exec.parallel=true;

... do something ...

Script B:

  SET hive.exec.dynamic.partition=true;
  SET hive.exec.dynamic.partition.mode=non-strict;
  SET hive.exec.parallel=true;

... do something else ...

The options that we set at the beginning of each script are the same. Is it possible somehow to extract them out to a common place (for example, into a commonoptions.sql) so that our scripts look like this:

Script A:

 <run commonoptions.sql>

... do something ...

Script B:

 <run commonoptions.sql>

... do something else ...

Ideally I would like to extract out table defintions as well, so that I have:

Script A:

 <run commonoptions.sql>
 <run defineExternalTableXYZ.sql>
... do something with Table XYZ ...

Script B:

 <run commonoptions.sql>
 <run defineExternalTableXYZ.sql>
... do something else with Table XYZ ...

That way I can manage the TableXYZ definition at a single spot. I am not using the Hive CLI. I am using Amazon EMR with Hive Steps.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
FirstName LastName
  • 1,891
  • 5
  • 23
  • 37

2 Answers2

2

You can store these configuration parameters in common file and load in each of your scripts using source command:

source /tmp/common_init.hql;

Also you can generate this file for each workflow from the database.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • I clarified the question. I am not using the CLI, so source is not an option for me. – FirstName LastName Nov 28 '16 at 19:47
  • You have your scripts stored in S3 right? In this case you can use the `cat` command to concatenate your scripts with common parameters and then execute concatenated script. – leftjoin Nov 28 '16 at 20:16
  • So you mean to say I should run a cat command to combine the two files and then upload/sync the files via aws cli before running the hive job? That can actually work. So something like this: cat commonOptions.sql script1_base.sql > script1.sql; aws s3 sync script1.sql ; . Did I get the gist of your solution correctly? – FirstName LastName Nov 28 '16 at 22:35
  • Yes I mean the same. Remember about eventual consistency in S3... If you will create each time unique file name (using date as suffix) - it's safer than re-creating the same file. I'm not sure, maybe eventual consistency is not an issue for your case – leftjoin Nov 29 '16 at 08:38
0

You should be able to use hive -i config.hql -f script_A.hql, where config.hql would contain your dynamic settings. The -i flag allows you to pass an initialization script that will be executed before the actual hive file passed to -f. I'm not super familiar with how AWS kicks off hive jobs in steps, but presumably you edit the submission arguments.

Derek Kaknes
  • 961
  • 8
  • 10