2

We have been able to deploy models (both custom prediction and Tensorflow SavedModel formats) to AI Prediction Platform, and basic testing shows things are at least functional for online predictions. We are now trying to load test a bit before putting this in production, and dealing with some stability issues.

We are seeing a variety of errors - 429 - "Rate of traffic exceeds serving capacity. Decrease your traffic or reduce the size of your model" 503 - "upstream connect error or disconnect/reset before headers. reset reason: connection failure" 504 - "Timed out waiting for notification."

We've implemented an exponential backoff approach, and that generally works to resolve the above issues, over time. However, we want to make sure we understand what's going on.

The 429s seem straightforward - wait for things to scale.

The 503 / 504 errors, we're not sure what the cause is, and how to resolve / eliminate. We have played with batch size (as per TensorFlow model serving on Google AI Platform online prediction too slow with instance batches - it appears that it doesn't make any internal optimizations for larger batches), machine size, etc. Not sure if it's a resource issue, though we see these errors with small batch sizes (instance count).

Anybody else experiencing these issues? Any best practices to suggest? Thanks!

Neil
  • 135
  • 1
  • 11
  • What is the machine type used with AI Platform serving? – guillaume blaquiere Sep 10 '20 at 13:41
  • We have tried a couple different sizes, right now we're using n1-highcpu-8. (For testing the custom predictions, we're using the mls1-c1-m2 machines.) – Neil Sep 10 '20 at 20:08
  • Do you really need a n1-highcpu-8? Or it's just to test if you have less error with a bigger VM? – guillaume blaquiere Sep 10 '20 at 20:13
  • We are still testing, trying to see what the sweet spot is. Testing with n1-highcpu-4, we were seeing a much higher error rate, and slower performance. – Neil Sep 10 '20 at 22:36
  • Can you consider [this article](https://medium.com/google-cloud/portable-prediction-with-tensorflow-and-cloud-run-669c1c73ebd1)? I also observed that AI Platform serving scale strangely and far less well than Cloud Run. If you have the courage to listen my french accent, [I performed a talk on this](https://www.youtube.com/watch?v=0x7gSuJ_Ugk). In addition, Cloud Run will be soon compliant with 4 CPUs. Let me know if you need helps on this, and if it solve your issue. – guillaume blaquiere Sep 11 '20 at 07:54
  • Thanks - I saw you had linked to that article on another post, and had taken a look. Our models are likely too big to fit within the constraints there, or, at least, some of them will be. So, hoping we can get there with AI Prediction! – Neil Sep 11 '20 at 19:10
  • Ok, you know that you can use 4Gb of memory with Cloud Run (in beta) – guillaume blaquiere Sep 12 '20 at 11:38
  • Ah, ok - looks like max 2 cpu / 4 GB ram on these now. Looking more at your write-up - you are basically just using a container running tensorflow-serving into Cloud Run, and letting Cloud Run autoscale for you, correct? A couple questions - first, isn't that essentially exactly what AI Prediction does (or should do)? Second, what sort of scaling behavior have you seen, if you don't mind - how quickly do instances come up, do you experience timeout issues, etc? Thanks! – Neil Sep 14 '20 at 06:58
  • AI Platform serving do exactly the same thing, maybe with more improvement under the hood. However, after a scale to 0, I observed cold start above 30s (timeout of my CURL request). With Cloud Run, the start is very fast (about 1 seconds, maybe more with your big schema). In addition, the scale up is slow with AI Platform, I guess based on CPU usage. Cloud Run scale according with the number of request. Finally, the minimum billing of AI PLatform is 15 minutes (even for a quick request of 1s). Cloud Run you pay exactly the processing time (round up to the upper 100ms) – guillaume blaquiere Sep 14 '20 at 08:10
  • Update, we've gotten it a bit more stable with more iteration on sizing, exponential backoff, etc. We still see issues with "Timed out waiting for notification" errors (throws a 504)? Trying to track that down. Retries seem to give eventual success, but would be good to understand what's going on. – Neil Sep 29 '20 at 07:03

0 Answers0