12

My goal is to import a custom .py file into my spark application and call some of the functions included inside that file

Here is what I tried:

I have a test file called Test.py which looks as follows:

def func():
    print "Import is working"

Inside my Spark application I do the following (as described in the docs):

sc = SparkContext(conf=conf, pyFiles=['/[AbsolutePathTo]/Test.py'])

I also tried this instead (after the Spark context is created):

sc.addFile("/[AbsolutePathTo]/Test.py")

I even tried the following when submitting my spark application:

./bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-M2 --py-files /[AbsolutePath]/Test.py ../Main/Code/app.py

However, I always get a name error:

NameError: name 'func' is not defined

when I am calling func() inside my app.py. (same error with 'Test' if I try to call Test.func())

Finally, al also tried importing the file inside the pyspark shell with the same command as above:

sc.addFile("/[AbsolutePathTo]/Test.py")

Strangely, I do not get an error on the import, but still, I cannot call func() without getting the error. Also, not sure if it matters, but I'm using spark locally on one machine.

I really tried everything I could think of, but still cannot get it to work. Probably I am missing something very simple. Any help would be appreciated.

Kito
  • 1,375
  • 4
  • 17
  • 37
  • does the absolute path contain any space? Are you importing in the app.py file? – mgaido Dec 21 '15 at 15:10
  • nope, no spaces in the path. Yes, app.py is my spark application where I'm trying to do the import. But as I said, I have the same Issue if I'm trying to do an import inside a pyspark shell. – Kito Dec 21 '15 at 15:13
  • How are you importing it? – mgaido Dec 21 '15 at 15:16
  • I'm not sure what you mean by "how", other than the 3 different approaches I tried and explained in the question? – Kito Dec 21 '15 at 15:18
  • I mean, in the file app.py, how do you import the file Test.py? – mgaido Dec 21 '15 at 15:27
  • Oh, now I get it. I thought that the addFile command actually imports the Test.py, so I didn't do any other import, which is why it didn't work. Thanks for pointing me in the right direction. In case anybody will have the same issue in the future, I answered that question myself. – Kito Dec 21 '15 at 15:55
  • related to this question and answers: https://stackoverflow.com/questions/48504849/pyspark-an-error-occurred-while-calling-o51-showstring-no-module-named-xxx – X.X Oct 14 '20 at 23:03

1 Answers1

18

Alright, actually my question is rather stupid. After doing:

sc.addFile("/[AbsolutePathTo]/Test.py")

I still have to import the Test.py file like I would import a regular python file with:

import Test

then I can call

Test.func()

and it works. I thought that the "import Test" is not necessary since I add the file to the spark context, but apparently that does not have the same effect. Thanks mark91 for pointing me into the right direction.

UPDATE 28.10.2017:

as asked in the comments, here more details on the app.py

from pyspark import SparkContext
from pyspark.conf import SparkConf

conf = SparkConf()
conf.setMaster("local[4]")
conf.setAppName("Spark Stream")
sc = SparkContext(conf=conf)
sc.addFile("Test.py")

import Test

Test.func()
Kito
  • 1,375
  • 4
  • 17
  • 37
  • 1
    I am looking for something similar to this. Can you please post the full code (app.py) , how you are importing and calling test.func() please? – goks Oct 27 '17 at 20:20