I have been researching about how to run Python code from Java code and I have seen a few options to do that.
My scenario is a little different, imagine a Spark application written in java which will process a large dataset (let's say 3B of records, around 1TB in size) distributed. For every single record, the Python code will be called once. Java code will need to pass an Avro record and the Python code will process it and will return result.
Given that the performance is important and we will deal with large datasets, I am trying to figure out the best option to approach this problem.