1

When running a job for GCP's AI Platform I am getting an error when importing a local module. Despite an __init__.py being in the module dir.

From the job

The replica master 0 exited with a non-zero status of 1. 
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/.local/lib/python3.7/site-packages/SemBERT/run_classifier.py", line 19, in <module>
    from tag_model.modeling import TagConfig
ModuleNotFoundError: No module named 'tag_model'

Despite the fact that if I package locally, then unzip, I can import the module.

Struct of dir I'm creating the job from

├── SemBERT
│   ├── README.md
│   ├── SemBERT.png
│   ├── __init__.py
│   ├── __pycache__
│   │   └── __init__.cpython-37.pyc
│   ├── data_process
│   │   ├── __init__.py
│   │   ├── datasets.py
│   │   └── util.py
│   ├── glue_data
│   │   └── MNLI
│   │       ├── dev_matched.tsv_tag_label
│   │       ├── test_matched.tsv_tag_label
│   │       └── train.tsv_tag_label
│   ├── pytorch_pretrained_bert
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── file_utils.py
│   │   ├── modeling.py
│   │   ├── optimization.py
│   │   └── tokenization.py
│   ├── run_classifier.py
│   ├── run_scorer.py
│   ├── run_snli_predict.py
│   └── tag_model
│       ├── __init__.py
│       ├── __pycache__
│       │   ├── __init__.cpython-37.pyc
│       │   └── modeling.cpython-37.pyc
│       ├── modeling.py
│       ├── tag_tokenization.py
│       ├── tagger_offline.py
│       └── tagging.py
├── setup.py
└── training.ipynb

Where --package-path is set to SemBERT.

I have packaged locally and have tested the import.

Source code can be found here

Liam Pieri
  • 601
  • 1
  • 6
  • 19
  • I have used the python shell to import the calss `from tag_model.modeling import TagConfig`. I could import the library without any issue and I can create the object using `obj = TagConfig(tag_vocab_size=5)`. Could you try to import manually from python shell? If you can do it, you can try to run the code in other path. I found out a [post](https://stackoverflow.com/questions/43728431/relative-imports-modulenotfounderror-no-module-named-x) that can be useful. – July Dec 07 '20 at 20:03
  • Thanks for the effort. I should've mentioned that in the comment. i have imported it locally with success. – Liam Pieri Dec 07 '20 at 23:05
  • Do you mean that locally the code is running successfully and the error is happening only in a [GCP's AI Platform instance](https://cloud.google.com/ai-platform/notebooks/docs/create-new)? If so, what is the instance specifications? I will try to replicate the issue – July Dec 08 '20 at 00:42
  • The error is occurring in a [training job](https://cloud.google.com/ai-platform/training/docs/overview) with the following [setup.py](https://gist.github.com/liamwazherealso/645e1883ff567022a1ae7e058eb1df99). – Liam Pieri Dec 08 '20 at 01:00
  • Sorry for the delayed response but after a replication I didnt have the same behavior you got, could you share with me the way you are submitting the job as I did it by using the [gcloud](https://cloud.google.com/ai-platform/training/docs/training-jobs#submit-job) – July Dec 30 '20 at 01:03

0 Answers0