Measure
Before figuring out which implementation path to choose you should measure how long it takes to process single item. Based on that you could choose size of work chunk submitted to thread pool, queue, cluster. Very small work chunks would increase coordination overhead. Too big work chunk will take long time to be processed so you will have less gradual progress info.
Single machine processing is simpler to implement, troubleshoot maintain and reason about.
Processing on single machine
Use java.util.concurrent.ExecutorService
Submit each work piece using submit(Callable<T> task)
method https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html#submit-java.util.concurrent.Callable-
Create instance of ExecutorService using java.util.concurrent.Executors.newFixedThreadPool(int nThreads)
. Choose reasonable value for nThreads Nnumber of CPU cores is reasonable startup value. You may add use more threads if there are some blocking IO calls (database, HTTP) in processing.
Processing on multiple machines - cluster
Submit processing jobs to cluster processing technologies like Spark, Hadoop, Google BigQuery.
Processing on multiple machines - queue
You can submit your records to any queue system (Kafka, RabbitMQ, ActiveMQ, etc). Then have multiple machines that consume items from the queue. You will be able to add/remove consumers at any time. This approach is fine if you do not need to have single place with processing result.