My team is trying to transition from Zeppelin to Jupyter for an application we've built, because Jupyter seems to have more momentum, more opportunities for customization, and be generally more flexible. However, there are a couple of things Zeppelin we haven't been able to equivalents for in Jupyter.
The main one is to have multi-lingual Spark support - is it possible in Jupyter to create a Spark data frame that's accessible via R, Scala, Python, and SQL, all within the same notebook? We've written a Scala Spark library to create data frames and hand them back to the user, and the user may want to use various languages to manipulate/interrogate the data frame once they get their hands on it.
Is Livy a solution to this in the Jupyter context, i.e. will it allow multiple connections (from the various language front-ends) to a common Spark back-end so they can manipulate the same data objects? I can't quite tell from Livy's web site whether a given connection only supports one language, or whether each session can have multiple connections to it.
If Livy isn't a good solution, can BeakerX fill this need? The BeakerX website says two of its main selling points are:
- Polyglot magics and autotranslation, allowing you to access multiple languages in the same notebook, and seamlessly communicate between them;
- Apache Spark integration including GUI configuration, status, progress, interrupt, and tables;
However, we haven't been able to use BeakerX to connect to anything other than a local Spark cluster, so we've been unable to verify how the polyglot implementation actually works. If we can get a connection to a Yarn cluster (e.g. an EMR cluster in AWS), would the polyglot support give us access to the same session using different languages?
Finally, if neither of those work, would a custom Magic work? Maybe something that would proxy requests through to other kernels, e.g. spark
and pyspark
and sparkr
kernels? The problem I see with this approach is that I think each of those back-end kernels would have their own Spark context, but is there a way around that I'm not thinking of?
(I know SO questions aren't supposed to ask for opinions or recommendations, so what I'm really asking for here is whether a possible path to success actually exists for the three alternatives above, not necessarily which of them I should choose.)