3

My team is trying to transition from Zeppelin to Jupyter for an application we've built, because Jupyter seems to have more momentum, more opportunities for customization, and be generally more flexible. However, there are a couple of things Zeppelin we haven't been able to equivalents for in Jupyter.

The main one is to have multi-lingual Spark support - is it possible in Jupyter to create a Spark data frame that's accessible via R, Scala, Python, and SQL, all within the same notebook? We've written a Scala Spark library to create data frames and hand them back to the user, and the user may want to use various languages to manipulate/interrogate the data frame once they get their hands on it.

Is Livy a solution to this in the Jupyter context, i.e. will it allow multiple connections (from the various language front-ends) to a common Spark back-end so they can manipulate the same data objects? I can't quite tell from Livy's web site whether a given connection only supports one language, or whether each session can have multiple connections to it.

If Livy isn't a good solution, can BeakerX fill this need? The BeakerX website says two of its main selling points are:

  • Polyglot magics and autotranslation, allowing you to access multiple languages in the same notebook, and seamlessly communicate between them;
  • Apache Spark integration including GUI configuration, status, progress, interrupt, and tables;

However, we haven't been able to use BeakerX to connect to anything other than a local Spark cluster, so we've been unable to verify how the polyglot implementation actually works. If we can get a connection to a Yarn cluster (e.g. an EMR cluster in AWS), would the polyglot support give us access to the same session using different languages?

Finally, if neither of those work, would a custom Magic work? Maybe something that would proxy requests through to other kernels, e.g. spark and pyspark and sparkr kernels? The problem I see with this approach is that I think each of those back-end kernels would have their own Spark context, but is there a way around that I'm not thinking of?

(I know SO questions aren't supposed to ask for opinions or recommendations, so what I'm really asking for here is whether a possible path to success actually exists for the three alternatives above, not necessarily which of them I should choose.)

Ken Williams
  • 22,756
  • 10
  • 85
  • 147
  • On one hand, indeed, these kind of question aren't fit for SO. Indeed, one of Zeppelin's strenght is sharing context between languages. In this case, you'd need to a middleware that allows this. Take a look here https://github.com/awesome-spark/awesome-spark#middleware maybe you'd find what you are looking for – eliasah Aug 14 '18 at 09:39
  • Otherwise, I'm obliged to vote to close this question for being off-topic. – eliasah Aug 14 '18 at 09:40

2 Answers2

2

Another possible is the SoS (Script of Scripts) polyglot notebook https://vatlab.github.io/sos-docs/index.html#documentation. It supports multiple Jupyter kernels in one notebook. SoS has several natively supported languages (R, Ruby, Python 2 & 3, Matlab, SAS, etc). Scala is not supported natively, but it's possible to pass information to the Scala kernel and capture output. There's also a seemingly straightforward way to add a new language (already with a Jupyter kernel); see https://vatlab.github.io/sos-docs/doc/documentation/Language_Module.html

schrödingcöder
  • 565
  • 1
  • 9
  • 18
1

I am using Livy in my application. The way it works is any user can connect to a already established spark session using REST (asynchronous calls). We have a cluster on which Livy sends Scala code for execution. It is up to you whether you want to close the session after sending the scala code or not. If the session is open then any one having access can send Scala code once again to do further processing. I have not tried sending different languages in the same session created through Livy but I know that Livy supports 3 languages in interactive mode i.e. R, Python and Scala. So, theoretically you would be able to send code in any language for execution.

Hope it helps to some extent.

Prashant
  • 702
  • 6
  • 21