I'm trying to run a PySpark application. the spark submit command looks something like this.
spark-submit --py-files /some/location/data.py /path/to/the/main/file/etl.py
My main file(etl.py) imports the data.py and uses the functions from data.py file, the code looks like this.
import data
def main(args_dict):
print(args_dict)
df1 = data.get_df1(args_dict['df1name'])
df1 = data.get_df2(args_dict['df1name'])
...
...
...
I'm passing the data.py file in the --py-files, but when I run the spark-submit I'm getting ImportError: No module named 'data'
I'm trying to figure out what it is that I'm doing wrong here. Thank you.