I am trying to run a pyspark job with an .egg file. The file has some code which references to .json files within the .egg file.
I get the absolute path of my .py file (in the .egg file) using os.path.dirname(file) and then append the path to it.
The path looks like:
/private/var/folders/8b/85wbwwxn2n31zfl1dgcpcfxs1d0qjg/T/spark-347c3633-7d95-467c-a222-83965afc7f34/userFiles-d0c02f9a-3c54-4f50-bb14-550a1bdcc26b/normalize-3.0-py3.5.egg/i18naddress/data/us.json
My directory structure is:
normalize-3.0-py3.5.egg
--i18naddress
----_init_.py (the class I call)
----data
------us.json
The relative path looks correct but however spark is unable to read these files and throws an IO error. Can someone tell me what i am doing wrong?