I have 4 python scripts and one configuration file of .txt . out of 4 python files , one file has entry point for spark application and also importing functions from other python files . But configuration file is imported in some other python file that is not entry point for spark application . I want to write spark submit command in pyspark , but I am not sure how to provide multiple files along configuration file with spark submit command when configuration file is not python file but text file or ini file.
for demonstration: 4 python files : file1.py , file2.py , file3.py . file4.py
1 configuration file : conf.txt
file1.py : this file has spark session and calling to all other python files . file3.py : this python file is reading conf.txt .
I want to provide all these files with spark submit but not sure about command . command I have tried is below :
'Args': ['spark-submit',
'--deploy-mode', 'cluster',
'--master', 'yarn',
'--executor-memory',
conf['emr_step_executor_memory'],
'--executor-cores',
conf['emr_step_executor_cores'],
'--conf',
'spark.yarn.submit.waitAppCompletion=true',
'--conf',
'spark.rpc.message.maxSize=1024',
f'{s3_path}/file1.py',
'--py-files',
f'{s3_path}/file2.py',
f'{s3_path}/file3.py',
f'{s3_path}/file4.py',
'--files',
f'{s3_path}/config.txt'
]
but above command is throwing an error : File "file1.py", line 3, in from file2 * ModuleNotFoundError: No module named 'file2'