4

I'm trying to run a Google Cloud Platform AutoML Batch text classification for a recently successfully trained model.

I've prepared the input data for batch classificatino based on the documentation but I always got this error

BatchPrediction could not start because no valid instances were found in the input file.

enter image description here

I tried 3 different JSONL formats found for AutoML text classification input data.

{"instances": [
    {"id":"1","text_snippet":{"content":"text content goes here",mimeType:"text/plain"}},
    {"id":"2","text_snippet":{"content":"text content goes here",mimeType:"text/plain"}},
    {"id":"3","text_snippet":{"content":"text content goes here",mimeType:"text/plain"}},
    {"id":"4","text_snippet":{"content":"text content goes here",mimeType:"text/plain"}},
    {"id":"5","text_snippet":{"content":"text content goes here",mimeType:"text/plain"}},
]}

,

{"id":"1","text_snippet":{"content":"text content goes here",mimeType:"text/plain"}}
{"id":"2","text_snippet":{"content":"text content goes here",mimeType:"text/plain"}}
{"id":"3","text_snippet":{"content":"text content goes here",mimeType:"text/plain"}}
{"id":"4","text_snippet":{"content":"text content goes here",mimeType:"text/plain"}}
{"id":"5","text_snippet":{"content":"text content goes here",mimeType:"text/plain"}}

and

{"content":"text content goes here",mimeType:"text/plain"}
{"content":"text content goes here",mimeType:"text/plain"}
{"content":"text content goes here",mimeType:"text/plain"}
{"content":"text content goes here",mimeType:"text/plain"}
{"content":"text content goes here",mimeType:"text/plain"}

I created the test AutoML batch text classification using the web console withi this configuration:

Question

What Is the correct format Batch AutoML Text Classification data input format?

Daniel Santos
  • 14,328
  • 21
  • 91
  • 174
  • It is not possible to use a JSONL file for batch prediction of text classification. Only a CSV file format is accepted for text classification. This is indicated in the [AutoML Natural Language documentation](https://cloud.google.com/natural-language/automl/docs/predict#batch_prediction). The CSV file should only contain 1 file (input file) per row. The CSV file and each input file needs to be stored in your Cloud Storage bucket. – Ricco D Jan 11 '21 at 07:58
  • Under following documentation it states you need to use JSONL for text classification too: https://cloud.google.com/vertex-ai/docs/predictions/batch-predictions – Zappageck Jul 27 '21 at 14:20

0 Answers0