I have been trying to do this. In PySpark shell, I get SparkContext as sc
. But when I use addPyFile
method, it makes the resulting SparkContext None
:
>>> sc2 = sc.addPyFile("/home/ec2-user/redis.zip")
>>> sc2 is None
True
What's wrong?
I have been trying to do this. In PySpark shell, I get SparkContext as sc
. But when I use addPyFile
method, it makes the resulting SparkContext None
:
>>> sc2 = sc.addPyFile("/home/ec2-user/redis.zip")
>>> sc2 is None
True
What's wrong?
Below is the source code to pyspark's (v1.1.1) addPyFile. (The source links for 1.4.1 in the official pyspark docs are broken as I'm writing this)
It returns None
, because there is no return
statement. See also: in python ,if a function doesn't have a return statement,what does it return?
So, if you do sc2 = sc.addPyFile("mymodule.py")
of course sc2
will be None because .addPyFile()
does not return anything!
Instead, simply call sc.addPyFile("mymodule.py")
and keep using sc
as the SparkContext
def addPyFile(self, path):
635 """
636 Add a .py or .zip dependency for all tasks to be executed on this
637 SparkContext in the future. The C{path} passed can be either a local
638 file, a file in HDFS (or other Hadoop-supported filesystems), or an
639 HTTP, HTTPS or FTP URI.
640 """
641 self.addFile(path)
642 (dirname, filename) = os.path.split(path) # dirname may be directory or HDFS/S3 prefix
643
644 if filename.endswith('.zip') or filename.endswith('.ZIP') or filename.endswith('.egg'):
645 self._python_includes.append(filename)
646 # for tests in local mode
647 sys.path.append(os.path.join(SparkFiles.getRootDirectory(), filename))