I am training a model using Google's Document AI. The training fails with the following error (I have included only a part of the JSON file for simplicity but the error is identical for all documents in my dataset):
"trainingDatasetValidation": {
"documentErrors": [
{
"code": 3,
"message": "Invalid document.",
"details": [
{
"@type": "type.googleapis.com/google.rpc.ErrorInfo",
"reason": "INVALID_DOCUMENT",
"domain": "documentai.googleapis.com",
"metadata": {
"num_fields": "0",
"num_fields_needed": "1",
"document": "5e88c5e4cc05ddb8.json",
"annotation_name": "INCOME_ADJUSTMENTS",
"field_name": "entities.text_anchor.text_segments"
}
}
]
}
What I understand from this error is that the model expects the field INCOME_ADJUSTMENTS
to appear (at least) once in the document but instead, it finds zero instances of it.
That would have been understandable except I have already defined the field INCOME_ADJUSTMENTS
in my schema as "Optional Once", i.e., this field can appear either zero or one time.
Am I missing something? Why does this error persist despite the fact that it is addressed in the schema?
p.s. I have also tried "Optional multiple" (and "Required once" and "Required multiple") and the error persists.
EDIT: As requested, here's what one of the JSON files looks like. Note that there is no PII here as the details (name, SSN, etc.) are synthetic data.