0

I have a process developed in Pyspark, but part of it depends on a sub-process that is calculadted in Scala.

For this reason, I need to execute a call from Pyspark that runs the code implemented in the Scala script, and then, when this process finishes, go back to the Pyspark process.

Is there any possible way to use execute a call to Scala code from Pyspark??

Also
  • 101
  • 1
  • 2
  • 6
  • 1
    I think you can compile your Scala script and then use the *pipe* command from pyspark https://spark.apache.org/docs/2.3.0/api/python/pyspark.html?highlight=pipe#pyspark.RDD.pipe to fork that sub-process. At the following link https://docs.databricks.com/user-guide/faq/running-c-plus-plus-code.html you can find an example with C++ compiled code, where it is shown how to call an executable and then get the output – titiro89 May 16 '18 at 11:14
  • @eliasah Thanks for helping me to make better contributions. Could you please tell me where can I find the related post to my comment? Thanks. – Also May 16 '18 at 13:05
  • 1
    Which related post @Also ? The link of the dupe question is under your question title... – eliasah May 16 '18 at 13:07

0 Answers0