0

I'm trying to run a PySpark application. the spark submit command looks something like this.

spark-submit --py-files /some/location/data.py /path/to/the/main/file/etl.py

My main file(etl.py) imports the data.py and uses the functions from data.py file, the code looks like this.

    import data
    def main(args_dict):
        print(args_dict)
        df1 = data.get_df1(args_dict['df1name'])
        df1 = data.get_df2(args_dict['df1name'])
        ...
        ...
        ...

I'm passing the data.py file in the --py-files, but when I run the spark-submit I'm getting ImportError: No module named 'data' I'm trying to figure out what it is that I'm doing wrong here. Thank you.

Riyan Mohammed
  • 247
  • 2
  • 6
  • 20

0 Answers0