0

I am having an issue with Google Cloud Dataproc with the structuring of my python project. I have a number of files which are all in the same folders and which call one another through import. The overall program runs fine locally.

However, when I place it in Google Cloud Dataproc, I have an issue with import. I have tried the answers presented in this Python can't find my module answer but to no effect.

The error is the following:

from model import PolicyEmergence
ImportError: No module named model

I tried to force the path using sys.path.insert(0, 'gs://bucket-name/') but to no avail. I am not sure if this is due to the changing path every time I run the job.

Any help would be welcome, thanks.

Kl1
  • 3
  • 2
  • 1
    As Toorusr mentions you can't import files from GCS -- only ones on the local filesystem. `pyspark` (and `gcloud dataproc jobs submit pyspark`) let you specify `--py-files` that should be available for import. Here is the documentation https://spark.apache.org/docs/latest/submitting-applications.html. What's the command you are using to submit your job? – Karthik Palaniappan Nov 13 '17 at 21:03
  • Thanks for the comment. This helped me pinpoint the problem. I am using the console directly and thought that only importing the main file would lead GCS locate all other files in the same folder. When I specify each of the files separately, the problem is resolved. Thanks for the link, I'll use that for bundling as I have too many files to import each time. – Kl1 Nov 14 '17 at 06:21

2 Answers2

1

You probably want:

from model import PolicyEmergence
alexisdevarennes
  • 5,437
  • 4
  • 24
  • 38
  • also OP make sure that the said file is withing the same directory that you are importing it from – N. Ivanov Nov 13 '17 at 16:04
  • yes, sorry the from model.py was a test I was doing. I corrected my question. As for the files, they are all in the same folder (bucket) on Google cloud. – Kl1 Nov 13 '17 at 16:18
0

The from [...] import [...] requires as first argument a [directory.]file where your classes are and secondly a specific name of your class or * you want to import from there.

for example:

     # The .py must be omitted! 
     from mymodule import *

To import your class (PolicyEmergence) from that file (model.py) you should delete the [.py]:

from model import PolicyEmergence

Tusr

Toorusr
  • 16
  • 1
  • Thanks for the answer. As mentioned before, the .py was added in one of my tests, removing it does not change anything. As for using: from my module import *, the same error occurs: from model import * ImportError: No module named model – Kl1 Nov 13 '17 at 16:32
  • I think if you want to import the folders directly form cloud, it is not possible, may be you can first get them and then place them in your foulder? – Toorusr Nov 13 '17 at 17:23