pyspark: IOError: [Errno 20] Not a directory (egg file import)

Asked Feb 02 '17 at 23:12

Active Feb 02 '17 at 23:12

Viewed 1,623 times

I am trying to run a pyspark job with an .egg file. The file has some code which references to .json files within the .egg file.

I get the absolute path of my .py file (in the .egg file) using os.path.dirname(file) and then append the path to it.

The path looks like:

/private/var/folders/8b/85wbwwxn2n31zfl1dgcpcfxs1d0qjg/T/spark-347c3633-7d95-467c-a222-83965afc7f34/userFiles-d0c02f9a-3c54-4f50-bb14-550a1bdcc26b/normalize-3.0-py3.5.egg/i18naddress/data/us.json

My directory structure is:

normalize-3.0-py3.5.egg
--i18naddress
----_init_.py (the class I call)
----data
------us.json

The relative path looks correct but however spark is unable to read these files and throws an IO error. Can someone tell me what i am doing wrong?

asked Feb 02 '17 at 23:12

Anuja Khemka

1

Looks like you were asking how to read files in an egg file. See http://stackoverflow.com/questions/3655352/how-to-access-files-inside-a-python-egg-file – zsxwing Feb 02 '17 at 23:40
Thanks zsxwing, this works!!! – Anuja Khemka Feb 03 '17 at 18:56
Thanks that helped me too! – void Nov 14 '18 at 14:45

pyspark: IOError: [Errno 20] Not a directory (egg file import)

0 Answers0