PySpark ImportError: No module named although included in --pyfiles

Asked Nov 29 '18 at 15:22

Active Nov 29 '18 at 16:00

Viewed 578 times

I'm trying to run a PySpark application. the spark submit command looks something like this.

spark-submit --py-files /some/location/data.py /path/to/the/main/file/etl.py

My main file(etl.py) imports the data.py and uses the functions from data.py file, the code looks like this.

    import data
    def main(args_dict):
        print(args_dict)
        df1 = data.get_df1(args_dict['df1name'])
        df1 = data.get_df2(args_dict['df1name'])
        ...
        ...
        ...

I'm passing the data.py file in the --py-files, but when I run the spark-submit I'm getting ImportError: No module named 'data' I'm trying to figure out what it is that I'm doing wrong here. Thank you.

edited Nov 29 '18 at 16:00

asked Nov 29 '18 at 15:22

Riyan Mohammed

Can you share some code please? try this `from data import *`. – pvy4917 Nov 29 '18 at 15:47
Please look at the above edit. @karma4917 – Riyan Mohammed Nov 29 '18 at 16:01
Did you try `from data import *` – pvy4917 Nov 29 '18 at 16:02
Yeah...tried that too..but got the same error. – Riyan Mohammed Nov 29 '18 at 16:03
1

Try zipping that file and try again. – pvy4917 Nov 29 '18 at 16:04
1

Have a look at this answer, https://stackoverflow.com/a/43532084/8085047 .. add sc.addFile() in your spark job. – Lakshman Battini Nov 29 '18 at 16:06

PySpark ImportError: No module named although included in --pyfiles

0 Answers0