I have a Spark 2.1 job where I maintain multiple Dataset objects/RDD's that represent different queries over our underlying Hive/HDFS datastore. I've noticed that if I simply iterate over the List of Datasets, they execute one at a time. Each individual query operates in parallel, but I feel that we are not maximizing our resources by not running the different datasets in parallel as well.
There doesn't seem to be a lot out there regarding doing this, as most questions appear to be around parallelizing a single RDD or Dataset, not parallelizing multiple within the same job.
Is this inadvisable for some reason? Can I just use a executor service, thread pool, or futures to do this?
Thanks!