In the returned error message, the value being indexed into the number field is a string with alphabetical characters, 'BS-000011/2022'
. This is no problem for the number
field that has a keyword type. However, it is an issue for the sequenceNumber
sub-field which has an integer type. The text value passed into number
is also passed into sequenceNumber
sub-field, hence the error.
Unfortunately, the text analyzer used in the previous question won't help either, as sorting can't be performed on a text field. However, the tokenizer used by the custom analyzer document_number_analyzer
can be repurposed into an ingest pipeline.
The custom tokenizer, for context, provided by the author in the previous question :
"tokenizer": {
"document_number_tokenizer": {
"type": "pattern",
"pattern": "-0*([1-9][0-9]*)\/",
"group": 1
}
}
If the custom analyzer is used, with the Elasticsearch _analyze
API on the value above like so (stack_index being a temporary index to use the analyzer) :
POST stack_index/_analyze
{
"analyzer": "document_number_analyzer",
"text": ["BS-000011/2022"]
}
The analyzer returns one token of 11
, but tokens are for search analysis, not sorting.
An Elasticsearch ingest pipeline, using the grok processor, can be applied to the index to perform the extraction of the desired number from the value and indexed as an integer. The processor needs to be configured to expect the value's format, which would be similar to 'BS-0000011/2022'. An example is provided below:
PUT _ingest/pipeline/numberSort
{
"processors": [
{
"grok": {
"field": "number",
"patterns": ["%{WORD}%{ZEROS}%{SORTVALUES:sequenceNumber:int}%{SEPARATE}%{NUMBER}"],
"pattern_definitions": {
"SEPARATE": "[/]",
"ZEROS" : "[-0]*",
"SORTVALUES": "[1-9][0-9]*"
}
}
}
]
}
Grok takes an input text value and extracts structured fields from it. The pattern where the sortable number will be extracted is the SORTVALUES
pattern, %{SORTVALUES:sequenceNumber:int}
. A new field, called sequenceNumber
, will be created in the document. When 'BS-000011/2022' is indexed in the number
field, 11 is indexed into the sequenceNumber
field as an integer.
You can then create an index template to apply the ingest pipeline. The sequenceNumber
field will need to be explicitly added as an integer type. The ingest pipeline will automatically index into as long as a value matching the format of the input above is indexed into the number
field. The sequenceNumber
field will then be available to sort on.