4

I am trying to use Google Cloud ML to host a Tensorflow model and get predictions. I have a pretrained model that I have uploaded to the cloud and I have created a model and version in my Cloud ML console.

I followed the instructions from here to prepare my data for requesting online predictions. For both the Python method and the glcoud method I get the same error. For simplicity, I'll post the gcloud method:

I run gcloud ml-engine predict --model spell_correction --json-instances test.json where test.json is my input data file (a JSON array named instances). I get the following result:

ERROR: (gcloud.ml-engine.predict) HTTP request failed. Response: {
  "error": {
  "code": 400,
  "message": "Precondition check failed.",
  "status": "FAILED_PRECONDITION"
  }
}

How can I get more details about this? The same exact error happens when I try via Python and there I have a googleapiclient.http.HttpRequest object containing the error. I just want to know why this error is happening other than this generic error. Does anyone know how to get more details via either the Python method or the gcloud method? I am assuming that since it is the same error, it is the same root cause.

Output of gcloud ml-engine models list:

NAME              DEFAULT_VERSION_NAME
spell_correction  testing

Output of gcloud ml-engine versions list --model spell_correction

NAME     DEPLOYMENT_URI
testing  gs://<my-bucket>/output/1/

test.json: {"instances": [{"tokens": [[9], [4], [11], [9]], "mask": [[18], [7], [12], [30]], "keep_prob": 1.0, "beam": 64}]}

My inputs to the model:

tokens: tf.placeholder(tf.int32, shape=[None, None])

mask: tf.placeholder(tf.int32, shape=[None, None])

keep_prob: tf.placeholder(tf.float32)

beam: tf.placeholder(tf.int32)

When calling via python, the request_body is just test.json as a string.

jbird
  • 506
  • 6
  • 21
  • Can you run `gcloud ml-engine models list` as well as `gcloud ml-engine versions list --model spell_correction` to verify the model was successfully created? – rhaertel80 Mar 13 '17 at 14:32
  • Added the output to the question – jbird Mar 13 '17 at 14:48
  • Can you post what is being sent in test.json and what is being sent over as request through python? – Bhupesh Mar 14 '17 at 15:13
  • Added to the question @Bhupesh – jbird Mar 14 '17 at 15:28
  • I sent an email. Would changing the shape break my checkpoints though? I already had to start over with the upgrade to 1.0, I would like to avoid doing that again if possible – jbird Mar 14 '17 at 19:18
  • Also, I recommend creating a graph for prediction that does NOT include the default layer, rather than setting keep_prob = 1.0. – rhaertel80 Mar 15 '17 at 04:32
  • Your checkpoints should generally be fine and you shouldn't have to retrain. In theory, you just need to regenerate the saved_model.pb. That said, it's not particularly easy to do so. I added a post on that: http://stackoverflow.com/questions/42801551/how-do-i-change-the-signatures-of-my-savedmodel-without-retraining-the-model/42801552#42801552 – rhaertel80 Mar 15 '17 at 05:12

2 Answers2

1

A side note: did you try "local predict" (https://cloud.google.com/sdk/gcloud/reference/ml-engine/local/predict) with your model first? You might be able to get more information there first.

roger
  • 11
  • 4
  • "Only debian based systems are supported at this time". I am on a Mac. Is there just a details property under the `googleapiclient.http.HttpRequest` object? That seems like it should have something in it. – jbird Mar 14 '17 at 13:07
  • Tried it anyway, got `no module named ml` when trying `import cloud.google.ml` – jbird Mar 14 '17 at 13:15
  • It's not officially supported on MAC OS, but it should work. The error you saw was because it couldn't find CloudML SDK (which we are getting rid of). It should be fixed very soon, perhaps in the next one or two gcloud release, and then you should be able to run the local prediction. – roger Mar 14 '17 at 19:19
  • 1
    In general, local prediction is helpful in debugging model and/or input, since it shares some common code path with online prediction. For the moment, you could probably get it to work after installing the CloudML SDK: pip install --upgrade --force-reinstall \ https://storage.googleapis.com/cloud-ml/sdk/cloudml.latest.tar.gz – roger Mar 14 '17 at 19:32
  • Installing worked without `--force-reinstall` (Macs are weird about `six`, even in Anaconda). When trying to predict locally, I got an error saying `RuntimeError: Expected meta graph file missing ~/PycharmProjects/nlc/export/1/export.meta`. However, there were lots of messages saying `Please use SavedModel instead`, which I did. I thought SavedModel exports a `.pb` file and a `variables` folder, not a `.meta` file? – jbird Mar 14 '17 at 20:33
  • See https://tensorflow.github.io/serving/serving_basic.html, towards the end. It says: "`saved_model.pb` is the serialized tensorflow::SavedModel. It includes the the one or more graph definitions of the model, as well as metadata of the model such as signatures." So shouldn't that mean the metagraph is inside of the `.pb` file? – jbird Mar 14 '17 at 20:38
  • This is the same bug in gcloud which will be fixed in tomorrow's release: http://stackoverflow.com/questions/42788485/import-error-no-module-named-cloud-ml/42789029?noredirect=1#comment72693624_42789029. – rhaertel80 Mar 15 '17 at 04:14
  • 1
    Just updated the code and I am getting the same error as before – jbird Mar 15 '17 at 19:28
1

After talking to Google Cloud ML support, I got this working.

The main issue I noticed was that all of the data in test.json gets wrapped in a list when it is sent to your model. I solved this by removing the outer list from tokens and mask in my file above. I also changed keep_prob and beam to constants as I do not want them to be able to change for every prediction I make.

As a general piece of advice, the error messages provided through the Python call were much more useful to me than the error messages from gcloud ml-engine predict. Also ensure to keep your gcloud install up-to-date, they are working on fixes almost constantly.

jbird
  • 506
  • 6
  • 21
  • What specifically did you do in order to solve this problem? My data doesn't really fit into the list situation you mentioned (I'm essentially using the architecture from [this tutorial](https://cloud.google.com/blog/big-data/2016/12/how-to-classify-images-with-tensorflow-using-google-cloud-machine-learning-and-cloud-dataflow)). Was your model working properly with `gcloud ml-engine local predict` before your fix? – kbhomes Mar 21 '17 at 22:15
  • I actually never got it working through a gcloud command. I just used the python format to make requests to Cloud ML. I don't know if it is possible to call a local model like that. Whenever I used `gcloud ml-engine [local] predict` I would get an error like `Incorrect types, 0 vs. object`. Never found out what that meant either – jbird Mar 22 '17 at 19:25