3

I want to run batch predictions inside Google Cloud's vertex.ai using a custom trained model. I was able to find documentation to get online prediction working with a custom built docker image by setting up an endpoint, but I can't seem to find any documentation on what the Dockerfile should be for batch prediction. Specifically how does my custom code get fed the input and where does it put the output?

The documentation I've found is here, it certainly looks possible to use a custom model and when I tried it didn't complain, but eventually it did throw an error. According to the documentation no endpoint is required for running batch jobs.

shortcipher3
  • 1,292
  • 9
  • 22
  • 1
    What error message did you get? What format is your input? (JSONL, TFRecord, CSV, file list) Can you add a sample of it? – Ksign Sep 22 '21 at 08:38
  • 2
    I see `Job failed. See logs for full details. ` I have checked the logs explorer and I see logs for my Vertex.ai endpoint, but not for my batch job. – shortcipher3 Sep 22 '21 at 15:46
  • file list is my preferred input. Mine looks very similar to the example in the documentation with different bucket and filename. ``` gs://path/to/image/image1.jpg gs://path/to/image/image2.jpg ``` – shortcipher3 Sep 22 '21 at 15:48
  • Your input format looks good. It would be useful if you could add more details anyway to your question (code snippet, how you are calling,...). BTW, you do need a Vertex AI endpoint to access your model. – Ksign Sep 29 '21 at 14:58
  • I suggest you go through this documentation to check the [container requirements for predictions](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#prediction) and [how to use it](https://cloud.google.com/vertex-ai/docs/predictions/use-custom-container). Then, follow this [tutorial](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/4db356050109b31789c0b6acd01f66cb5ef5ee15/notebooks/official/custom/sdk-custom-image-classification-batch.ipynb) and compare with your code. – Ksign Sep 29 '21 at 14:58
  • Other things you could do is use one of the [pre-built images for containers](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers) with your code or pull one and use `docker image history [OPTIONS] IMAGE` to see the dockerfile commands and compare with your dockerfile – Ksign Sep 29 '21 at 15:01
  • [To customize how Vertex AI serves online predictions from your custom-trained model, you can specify a custom container](https://cloud.google.com/vertex-ai/docs/predictions/use-custom-container) This document is about online predictions, not batch predictions. It isn't clear to me that the same mechanism will work for batch predictions. – shortcipher3 Oct 01 '21 at 22:16
  • The docker containers use `tensorflow_serving` to serve models, I'll try to understand how that works to see if I can get something working. – shortcipher3 Oct 01 '21 at 22:18
  • The `scikit-learn` image uses a python script, I think I'll start with that. Thanks – shortcipher3 Oct 01 '21 at 22:21
  • 1
    Sure, but if you need more help, feel free to share more details on your question (code snippet, how you are calling,...) so we have a better insight. – Ksign Oct 04 '21 at 11:47
  • My sample code: `model = aiplatform.Model(model_path) batch_prediction_job = model.batch_predict( gcs_source=gcs_source, gcs_destination_prefix=gcs_destination, machine_type='n1-standard-4', instances_format='csv', sync=False )` The error is: `File "/usr/local/lib/python3.7/site-packages/google/cloud/aiplatform/base.py", line 676, in resource_name self._assert_gca_resource_is_available() ... RuntimeError: BatchPredictionJob resource has not been created.` – havryliuk Jan 18 '23 at 14:45

0 Answers0