14

The pyspark3, pyspark, and spark kearnels in jupyterhub docker on amazon emr do not seem to allow autocomplete of function names or the doc string , shift-tab. Has anyone else noticed this behaviour?

I launched a cluster with jupyterhub and spark. I created a new notebook for pyspark or pyspark3.

It seems to be using conda inside the docker. I have tried to upgrade all but that just breaks everything.

user249806
  • 299
  • 1
  • 3
  • 14
  • I'm having the same issue. Given how easily everything breaks in my SSH session and/or via the JupyterLab terminal, I'm thinking these are custom Docker images that are very fragile (I've seen the same thing before with nvidia's RAPIDS images). It's disappointing and I've never found a way to make them more robust without simply creating my own image from something like the jupyter/ Docker images as a starting point. – emigre459 Dec 09 '19 at 03:32
  • I encountered the same issue. This is why I wrote this blog post in which I installed my own jupyter. https://www.perfectlyrandom.org/2018/08/11/setup-spark-cluster-on-aws-emr/ When I installed it on my own, I was able to use the missing features (see screenshot at the end of the post). – Ankur Jun 25 '20 at 07:14
  • @emigre459 also the notebook will timeout after too long https://aws.amazon.com/premiumsupport/knowledge-center/emr-session-not-found-http-request-error/ – qwr Nov 01 '21 at 00:26
  • Kernel will die if out of memory also, leading to session error https://stackoverflow.com/questions/58062824/session-isnt-active-pyspark-in-an-aws-emr-cluster – qwr Nov 01 '21 at 05:22

1 Answers1

0

Using EMR 5.33.1, JupyterHub 1.1.0, Spark 2.4.7, tab suggestions works for me when pyspark kernel is set to "Trusted".

I believe tab suggestions is not enabled by default because it is considered code "the user opened but did not execute". https://jupyter-notebook.readthedocs.io/en/stable/security.html

qwr
  • 9,525
  • 5
  • 58
  • 102