0

Using Hue, I've got a Hive query that will take an input (eg. an ID number) and return a record based on that. I need to handle multiple numbers to look up in one go (in serial or parallel) and collate the results (i.e. list the records for each, one after the other) so input might be:

1234567890
45345353
32423422
1323122
etc...

I've got access to Hue (which I'm supposed to use), Hive, Oozie and Beeline. How do I:

1.) extract the number for each line

2.) repeatedly call my HiveQL query passing in each number in turn

3.) supply the total output to the user in one go

I don't know Python if that's relevant but could attempt a shell script.

I'm guessing one way might be to get the multi-line user input via Oozie (can it prompt a user for input?), then pass that to a shell script which extracts the number from each line and uses beeline to repeatedly run my Hive query with the next number as the parameter?

Thanks

Alex Kerr
  • 956
  • 15
  • 44
  • Why cann't you join your query with input dataset instead of "repeatedly call my HiveQL query " ? – leftjoin Jan 23 '21 at 08:08
  • Thanks @leftjoin but new to this and don't understand how to do that sorry... – Alex Kerr Jan 23 '21 at 13:08
  • What a problem passing your input as a parameter into IN filter for example? Please show what have you tried? This may help: https://stackoverflow.com/a/56963448/2700344 and https://stackoverflow.com/a/65235596/2700344 – leftjoin Jan 23 '21 at 13:16
  • @leftjoin thanks again. I ended up using 'IN' in my where clause, e.g. 'where id_num IN {$IDs_list}' so user gets prompted for IDs_list when the Oozie job runs. Also, do you know how to branch based on user input? Can't see how to allow the user to input anything into an Oozie workflow. Thanks – Alex Kerr Jan 25 '21 at 17:36

0 Answers0