2

i am trying to deploy my code on cloud-dataproc. My app is made of two modules, moduleA.py and moduleB.py moduleA import a function from modulB I have uploaded both modules in the same bucket, however when i kick off my dataproc template , dataproc complains that it cannot find moduleB WHat extra steps do i need to follow in order for my moduleA to see moduleB on dataproc?

kind regards

user1068378
  • 333
  • 2
  • 12
  • 1
    Could you share how you're invoking Dataproc? And perhaps how the modules are declared? – tix Oct 16 '19 at 20:30
  • This answer has lots of details on declaring and importing modules: https://stackoverflow.com/questions/53863576 – tix Oct 16 '19 at 20:33
  • Hello, Thanks for getting back. so, i have two modules , main.py and external.py all stored in my gs://mybucket. main.py has this import from exernal import extenal_function i have setup a template that i am using to invoke dataproc, and this is the command line i have attempted - with no success -- – user1068378 Oct 17 '19 at 21:14
  • gcloud dataproc workflow-templates add-job pyspark gs://mm_dataproc/quixote_sorted.py --step-id=quixo --py-files gs://mm_dataproc/external.py --workflow-template=quixote_dtp_template_5 -- gs://mm_dataproc/donquixote.txt – user1068378 Oct 17 '19 at 21:17
  • Could you include error in the post? And also how the modules are declared? – tix Oct 17 '19 at 22:52

1 Answers1

1

Apologies to all..I think I had some other unrelated errors in one of the steps I thought I deleted it, nothing to do with dependencies Managed to have a successful run by packaging dependencies in a zip and run with. --py-files gs://mydeps.zip .....

Kind regards

user1068378
  • 333
  • 2
  • 12