0

After having created a TensorFlow 1.4 model for Python 3, I have now found that Google Cloud ML Engine currently only has support for Python 2.7.

Back-porting my Python 3 code at first seemed simple enough: Some scripts still work as expected when I replace their shebang #!/usr/bin/env python3 with #!/usr/bin/env python. python -V reports 2.7.10 in my (macOS) environment.

Yet one script does not react so gracefully. When I run it now, it produces a Segmentation fault: 11 without any previous warnings or other diagnostic output.

How can I find out about the root cause, so that I know what else to change to make also that script palatable to Python 2?

UPDATE The segmentation fault apparently occurs during a call to session.run(get_next), where get_next is obtained from a tf.data.Iterator as follows:

iterator = dataset.make_initializable_iterator()
get_next = iterator.get_next()
Drux
  • 11,992
  • 13
  • 66
  • 116

1 Answers1

2

There are two issues here: one is about Python 3 support and the other is about the segfault.

Python 3 Support CloudML Engine now supports Python 3, via the 'pythonVersion' field when submitting jobs (see the API reference docs).

If you are using gcloud you will need to create a config file like this (let's name it config.yaml):

trainingInput:
  pythonVersion: "3.5"

When you submit your job, point gcloud to that file, e.g.

gcloud ml-engine jobs submit training --config=config.yaml ...

Segfault This may be caused by running out of memory. Please check the memory usage in the console for that job. That said, if the job dies abruptly, memory usage at the time of failure may not be accurately reflected for that job.

Drux
  • 11,992
  • 13
  • 66
  • 116
rhaertel80
  • 8,254
  • 1
  • 31
  • 47
  • Excellent re Python 3. Re segfault, I have by now found out that the model can't open the dataset file. It is still picking it up from a relative path, which worked outside Cloud ML, but apparently is not good practice with `gcloud ml-engine local train`. Maybe I'll post a follow-up question on this. – Drux Dec 22 '17 at 15:39
  • FYI: Follow-up question now posted [here](https://stackoverflow.com/questions/47952143/does-google-cloud-ml-engine-trainer-has-to-be-explicitly-aware-of-google-cloud-s). – Drux Dec 23 '17 at 18:07