1

I have been trying to do this. In PySpark shell, I get SparkContext as sc. But when I use addPyFile method, it makes the resulting SparkContext None:

>>> sc2 = sc.addPyFile("/home/ec2-user/redis.zip")
>>> sc2 is None
True

What's wrong?

Community
  • 1
  • 1
kamalbanga
  • 1,881
  • 5
  • 27
  • 46

1 Answers1

8

Below is the source code to pyspark's (v1.1.1) addPyFile. (The source links for 1.4.1 in the official pyspark docs are broken as I'm writing this)

It returns None, because there is no return statement. See also: in python ,if a function doesn't have a return statement,what does it return?

So, if you do sc2 = sc.addPyFile("mymodule.py") of course sc2 will be None because .addPyFile() does not return anything!

Instead, simply call sc.addPyFile("mymodule.py") and keep using sc as the SparkContext

def addPyFile(self, path): 
635          """ 
636          Add a .py or .zip dependency for all tasks to be executed on this 
637          SparkContext in the future.  The C{path} passed can be either a local 
638          file, a file in HDFS (or other Hadoop-supported filesystems), or an 
639          HTTP, HTTPS or FTP URI. 
640          """ 
641          self.addFile(path) 
642          (dirname, filename) = os.path.split(path)  # dirname may be directory or HDFS/S3 prefix 
643   
644          if filename.endswith('.zip') or filename.endswith('.ZIP') or filename.endswith('.egg'): 
645              self._python_includes.append(filename) 
646              # for tests in local mode 
647              sys.path.append(os.path.join(SparkFiles.getRootDirectory(), filename)) 
Community
  • 1
  • 1
Paul
  • 26,170
  • 12
  • 85
  • 119