2

I have done quite some spark job in Java/Scala, where I can run some test spark job directly from main() program, as long as I add the required spark jar in the maven pom.xml.

Now I am starting to work with pyspark. I am wondering if I could do something similar? For example, I am using pycharm to run a the wordCount job:

enter image description here

If I just run the main() program, I got the following error:

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/helpers/profiler/run_profiler.py", line 145, in <module>
    profiler.run(file)
  File "/Applications/PyCharm.app/Contents/helpers/profiler/run_profiler.py", line 84, in run
    pydev_imports.execfile(file, globals, globals)  # execute the script
  File "/Users/edamame/PycharmProjects/myWordCount/myWordCount.py", line 6, in <module>
    from pyspark import SparkContext
ImportError: No module named pyspark

Process finished with exit code 1

I am wondering how do I import pyspark here? so I could run some test job from the main() program like I did in Java/Scala.

I also tried to edit the interpreter path: enter image description here

and my screenshot from Run -> Edit Configuration:

enter image description here

Last is my project structure screen shot: enter image description here

Did I miss anything here? Thanks!

Edamame
  • 23,718
  • 73
  • 186
  • 320
  • 1
    Looks like you are missing the pyspark module? – OneCricketeer Jul 18 '16 at 22:30
  • 1
    Possible duplicate of [How to link PyCharm with PySpark?](http://stackoverflow.com/questions/34685905/how-to-link-pycharm-with-pyspark) – OneCricketeer Jul 18 '16 at 22:30
  • @cricket_007: I have modified my question above. I tried to "Edit interpreter paths so it contains path to $SPARK_HOME/python" as mentioned in "How to link PyCharm with PySpark?", but I can't find where to edit the interpreter path ... am I missing anything here? thanks – Edamame Jul 18 '16 at 23:01
  • Have you followed the possible dupe? – OneCricketeer Jul 18 '16 at 23:05
  • Yes, it asked me to edit the interpreter path, but I don't see such option ... – Edamame Jul 18 '16 at 23:20
  • I think you are in the wrong settings window. You need to go from the Run Window Menu, then Edit Configuration, and there you can edit the Environment Variables and Interpreter and whatnot – OneCricketeer Jul 19 '16 at 01:50
  • 1
    I got this working in IntelliJ (also, I read the other answers on that post), here is my "interpreter settings" window with the highlighted line I added. http://i.stack.imgur.com/iltzW.png – OneCricketeer Jul 19 '16 at 04:27
  • mmm ... I am using the free pyCharm community version. Could that be a problem? Thanks! – Edamame Jul 19 '16 at 04:31
  • Shouldn't matter. I just use the community version of IntelliJ IDEA with the Pycharm plugin because I also do Android, Scala, and Java coding. – OneCricketeer Jul 19 '16 at 04:55
  • I just add a screenshot above for my Run -> Edit Configuration, where can I modify interpreter path there? Thanks – Edamame Jul 19 '16 at 11:26
  • You modify the interpreter path in the first settings window you show (look in Project Structure, maybe. Intellij doesn't have the settings you show). You **add** the `SPARK_HOME` and `PYTHONPATH` **environment variables** in the second window you added. – OneCricketeer Jul 19 '16 at 12:19
  • Thanks. I also added the Project Structure screenshot ... the only place I can add is root content, but doesn't seem to work. Is IntelliJ with Pycharm plugin better than PyCharm? Should I actually use IntelliJ ... as I couldn't even do such a simple configuration in Pycharm ... – Edamame Jul 19 '16 at 17:02
  • 1
    This post helped me. Though, as you said, you work with Java/Scala, so I don't see why you need plain PyCharm when IntelliJ IDEA works fine with python projects. http://stackoverflow.com/a/36415945/2308683 – OneCricketeer Jul 19 '16 at 17:20
  • It just I have used Eclipse for Java/Scala for years, and it works very well. Since Eclipse is free, I don't know if the IntelliJ free community version could work as good as Eclipse and would the free version lack any functionality. Do you have all you needs in the IntelliJ free community version? Thanks! – Edamame Jul 19 '16 at 17:25
  • 1
    I started with Eclipse, but moved to the community version of IntellIj a few years back, and it works for my needs. Can't do Java EE or database connections without paying for IntelliJ, there are free other ways around that. If you like PyCharm for Python, then any Java/Scala work in IntelliJ would be very similar. – OneCricketeer Jul 19 '16 at 17:27
  • probably this would help https://medium.com/@gauravmshah/pyspark-on-intellij-with-packages-auto-complete-5e3208504707 – Gaurav Shah Dec 13 '18 at 18:21

2 Answers2

2

I finally got it work following the steps in this post. It is really helpful!

https://medium.com/data-science-cafe/pycharm-and-apache-spark-on-mac-os-x-990af6dc6f38#.jk5hl4kz0

Edamame
  • 23,718
  • 73
  • 186
  • 320
1

I added the py4j-x.x.x-src.zip and pyspark.zip under $SPARK_HOME/python/lib to the project structure (preferences > Project> Project Structure and then do "+ Add Content Root") and it worked fine.

PS: Pycharm already had $PYTHONPATH and $SPARK_HOME read from the os env, which was set in .bashrc/.bash_profile

Sasinda Rukshan
  • 439
  • 1
  • 5
  • 14