I would like to run multiple Hive queries, preferably in parallel rather than sequentially, and store the output of each query into a csv file. For example, query1
output in csv1
, query2
output in csv2
, etc. I would be running these queries after leaving work with the goal of having output to analyze during the next business day. I am interested in using a bash shell script because then I'd be able to set-up a cron
task to run it at a specific time of day.
I know how to store the results of a HiveQL query in a CSV file, one query at a time. I do that with something like the following:
hive -e
"SELECT * FROM db.table;"
" | tr "\t" "," > example.csv;
The problem with the above is that I have to monitor when the process finishes and manually start the next query. I also know how to run multiple queries, in sequence, like so:
hive -f hivequeries.hql
Is there a way to combine these two methods? Is there a smarter way to achieve my goals?
Code answers are preferred since I do not know bash well enough to write it from scratch.
This question is a variant of another question: How do I output the results of a HiveQL query to CSV?