0

I need to perform some custom settings for the python interpreter: specifically related to pyspark: the "interpeter" will actually be the spark-submit (aka pyspark) shell script. The intent is to be able to run pyspark jobs within the python console. Running within a Run Configuration would also be just fine: this would be an alternate approach. I use IJ Ultimate - which has good python support: except well maybe for this particular use case.

Let us compare to pycharm - and specifically an ability to customize the interpreter - including setting local, remote, or virtualenv:

enter image description here

The Intellij Ultimate seems to lack those options: instead it is pointed to the libraries for a python sdk. That will not be sufficent for the given use case:

enter image description here

Here is the dropdown: notice there is no way to add a custom python interpreter.

enter image description here So is there a way in Intellij to set the interpreter path? I want to set it to $SPARK_HOME/bin/pyspark ?

WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560
  • Have you added the interpreter? The support definitely exists – OneCricketeer Feb 26 '17 at 04:49
  • @cricket_007 Please clarify. The above dialog is all that I have found: as we can see there is no means on it to specify the `$SPARK_HOME/bin/pyspark` binary. – WestCoastProjects Feb 26 '17 at 04:54
  • "use specified interpreter" has a drop down, and you should be able to add additional ones – OneCricketeer Feb 26 '17 at 04:56
  • @cricket_007 The only options are and `Python 2.7.12`. *I can not add a custom entry*. I also updated the OP. – WestCoastProjects Feb 26 '17 at 04:59
  • You're in the wrong dialog window, then. Open up the Project settings and go to "Global libraries", i think, or "SDK" – OneCricketeer Feb 26 '17 at 05:05
  • @cricket_007 Those windows do not have anything for interpreters: they permit adding e.g. `pyspark.zip` to the libraries. That does not resolve the issue of requiring `spark-submit` (aka `pyspark`) to be invoked. Only changing the interpeter achieves the required steps for python packaging performed by `spark-submit`. – WestCoastProjects Feb 26 '17 at 05:07

1 Answers1

2

PyCharm and IntelliJ have the exact same options to add and configure Python code.

PyCharm just makes it easier.

Those windows do not have anything for interpreters

Pretty sure it does... You add interpreters here.

bin/pyspark is not an interpreter, it is a shell script. You just set the regular Python interpreter.You also need to add the Pyspark libraries. (See below)

enter image description here

Then, you configure the environment variables here (Run Configurations) (see those below)

enter image description here


As far as PySpark libraries go, you have to add these (use the full path, not variables)

  • $SPARK_HOME/python/
  • $SPARK_HOME/python/lib/py4j-X.X-src.zip

You also need to set these variables in the Edit Configurations window shown

  • SPARK_HOME = path to spark
  • PYTHONPATH = path to py4j-X.X-src.zip (also need to append the path to the current python interpreter's directory, I believe)

Ref:

And here's a video of some code running

https://www.youtube.com/watch?v=u-P4keLaBzc

Community
  • 1
  • 1
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • afaict: this does not address the point about `spark-submit` aka `pyspark`: *yes* it *is* a shell script: and **that** is what is needed for launching spark job. Adding the libraries would allow the code to compile. But it does not move the needle in terms of launching a job on a spark cluster afaik. But in any case the clear next step: *please try it out yourself*: show a tiny/simple four line `pyspark` program launched from Intellij using your approach. I'd love to be shown off target/incorrect - and with a working pyspark from IJ. – WestCoastProjects Feb 26 '17 at 07:03
  • Here's another thought: no need to even invoke `spark-submit`/`pyspark`. Just set up a python "interpreter" (actually a shell script) `my-interpreter.py` to represent an arbitrary python launcher. Show that running and we're done. – WestCoastProjects Feb 26 '17 at 07:09
  • I've ran it fine... Using those exact steps. If you want to fire to a cluster, call `setMaster` (or whatever pyspark's alternative is). For example, here's even a github starter repo. https://github.com/ybenoit/pyspark-ide-starter – OneCricketeer Feb 26 '17 at 15:25
  • I already had in place the `SPARK_HOME` and `PYTHONPATH` - since well before the OP made. Are you saying that you can get `pyspark` to launch - without invoking `spark-submit` ? Inside that script we have a call to a `jvm` program: there is no way that invoking a normal python interpreter would make that happen: `exec "${SPARK_HOME}"/bin/spark-submit pyspark-shell-main --name "PySparkShell" "$@"` . That invokes the `SparkSubmit` scala class. Although I *will* try again (already done so many times) how can launching this as regular python program (w/o invoking SparkSubmit) work? – WestCoastProjects Feb 26 '17 at 16:58
  • All I ever get when following any of those directions is `Exception: Java gateway process exited before sending the driver its port number`. – WestCoastProjects Feb 26 '17 at 17:41
  • On my Mac, I used this post http://stackoverflow.com/a/36415945/2308683 Code runs absolutely fine. – OneCricketeer Feb 26 '17 at 17:47
  • 1
    I just added to the `run configuration`: `PYSPARK_SUBMIT_ARGS="pyspark-shell my_python_script.py"` and that *did* run. Man this is rough. The intent would be to run `pyspark-shell` and have it remain resident/running: that is why i'd like to have a `python console` instead of a `run configuration` – WestCoastProjects Feb 26 '17 at 18:09
  • Then use Jupyter instead of IntelliJ ;) – OneCricketeer Feb 26 '17 at 18:10
  • I do use `jupyter` for some purposes but Intellij has great keyboard shortcuts and the best editor. It is v convenient to have the source code in the top notch IJ editor on top and the run area on the bottom. Jupyter is pretty good overall but it's not a decent editor. – WestCoastProjects Feb 26 '17 at 18:13
  • I believe there is a plugin to get Python Notebook functionality within IntelliJ. – OneCricketeer Feb 26 '17 at 18:25
  • ah yea i had forgotten . let's check that out. in any case i'm going to upvote / award. you've certainly put in the effort here. – WestCoastProjects Feb 26 '17 at 18:28
  • Looks like there is *not* a plugin for IJ- only pycharm. I am trying to get onboard with pycharm as an additional platform since it seems more python devs are over there. Are you using CE or Ultimate pycharm? – WestCoastProjects Feb 26 '17 at 18:31
  • I had Ultimate during undergrad, but didn't use the features. Just use CE right now. – OneCricketeer Feb 26 '17 at 18:34
  • k thx. For IJ the ultimate is more stable than CE that is too buggy for serious use in my experience. Cost is v low not an issue. Sounds like the pycharm CE were actually usable/stable. – WestCoastProjects Feb 26 '17 at 18:36
  • Btw, I can open ipynb files perfectly fine in IntelliJ CE – OneCricketeer Feb 26 '17 at 18:39
  • ah! i'd like to use IJ. I just tried out pycharm. It's a keyboardist hater: e.g. context menus do not recognize accelerator keys (e.g. typing `p` to select `Paste`) so you have to manually scroll down. So will try opening a nb in ij – WestCoastProjects Feb 26 '17 at 18:42
  • You'll have to download the Python plugin and configure the interpreters (as shown) – OneCricketeer Feb 26 '17 at 18:43
  • oh i've been using python in IJ for long time - it's great. just had trouble with pyspark. btw is there any way to hit "enter" to run a cell -like in jupyter? tell you what I'll make another cupcake question for that and you can get some more points. – WestCoastProjects Feb 26 '17 at 18:45
  • I deleted the question pending more familiarity: there *is* a command vs edit mode. If the only thing missing were keystroke to run then i can live with that. – WestCoastProjects Feb 26 '17 at 19:07
  • When I do `shift-enter` on IJ python plugin on a notebook it brings up a ` Start jupyter server` dialog .. and which apparently is failing. So i misinterpreted that behavior. The issue is the start nb server command fails . – WestCoastProjects Feb 26 '17 at 19:17
  • In case you were curious here is a question on the error http://stackoverflow.com/questions/42473211/non-existing-current-directory-error-when-starting-ipython-jupyter-notebook-se – WestCoastProjects Feb 26 '17 at 19:35