Jupyter + PySpark autocomplete

Question

Sorry for the newbie Jupyter quesion -

I've installed Jupyter & PySpark using this manual - https://blog.sicara.com/get-started-pyspark-jupyter-guide-tutorial-ae2fe84f594f

All seems to work but I don't have autocomplete for some "nested" functions

For example - running "spark" -> I get spark session

When I press tab after "spark." -> I get the list of possible suggestions such as "read"

But pressing tab after spark.read. don't show anything. Though I would expect to show options such as "csv", "parquat" etc...

Important note - running "spark.read.csv("1.txt")" works

Also - tried applying suggestions from `ipython` tab autocomplete does not work on imported module but it didn't work

What am I missing?

Reason might be more prosaic; I suppose spark is quite heavy hence it takes a lot of time to parse dependencies. If the amount of returned suggestions is too big (or it takes too much time), the process might be killed, you may want to check that. — Szymon Maszke, Feb 10 '19 at 18:43
thanks for the suggestion! how can i check this? btw - when I run a= spark.read. and later run a. I get all the suggestions i was hoping for — Vitali Melamud, Feb 10 '19 at 19:43
You may observe resource usage and get the process responsible for completion, I assume CPU usage would sky-rocket during parsing of the library. What you've written above may indicate that's actually the case, maybe someone else will be able to pin-point the issue further. — Szymon Maszke, Feb 10 '19 at 20:58

score 2 · Accepted Answer · answered Sep 17 '19 at 00:07

I developed a Jupyter Notebook Extension based on TabNine which provides code auto-completion based on Deep Learning. Of course it also supports Pyspark. Here's the Github link of my work: jupyter-tabnine.

It's available on pypi index now. Simply issue following commands, then enjoy it:)

pip3 install jupyter-tabnine
jupyter nbextension install --py jupyter_tabnine
jupyter nbextension enable --py jupyter_tabnine
jupyter serverextension enable --py jupyter_tabnine

demo

score 0 · Answer 2 · answered Dec 09 '19 at 10:57

This can be done by manually importing or setting the .env variable for python.

In a python session / Notebook.

import rlcompleter, readline
readline.parse_and_bind("tab: complete")

To Enable it on PySpark startup - my case.

.bash_profile

export PYTHONSTARTUP="$HOME/.pythonrc"

.pythonrc

import rlcompleter, readline
readline.parse_and_bind("tab: complete")

Jupyter + PySpark autocomplete

2 Answers2