3

I am trying to run a pyspark job with an .egg file. The file has some code which references to .json files within the .egg file.

I get the absolute path of my .py file (in the .egg file) using os.path.dirname(file) and then append the path to it.

The path looks like:

/private/var/folders/8b/85wbwwxn2n31zfl1dgcpcfxs1d0qjg/T/spark-347c3633-7d95-467c-a222-83965afc7f34/userFiles-d0c02f9a-3c54-4f50-bb14-550a1bdcc26b/normalize-3.0-py3.5.egg/i18naddress/data/us.json

My directory structure is:

normalize-3.0-py3.5.egg
--i18naddress
----_init_.py (the class I call)
----data
------us.json

The relative path looks correct but however spark is unable to read these files and throws an IO error. Can someone tell me what i am doing wrong?

Anuja Khemka
  • 265
  • 1
  • 6
  • 17

0 Answers0