What is a multi-token string?

Question

Was browsing Elasticsearch issues on github and came across this comment stating that:

Multi-token string fields are not sortable by Elasticsearch in any predictable way.

For reference we are talking here about a string timestamp in the form "14/05/08-13:41:23".

From the context, I would assume that any non alphanumerical string (with other chars than [A-Za-z0-9]) would be a multi-token String?

score 1 · Accepted Answer · edited May 23 '17 at 10:26

Ok I misinterpreted it for a general concept about strings that I would not known, but it seems to actually be Elasticsearch specific Jargon:

By default, when processing fields mapped as strings, ElasticSearch parses them and tries to broke them into multiple tokens, and it seems to be the case for strings containing / or . As a consequence, those strings become "multi token strings". To avoid that, one need to edit the mappings of ElasticSearch and set the field as "not_analyzed", eg:

"my_field2": { "type": "string", "index": "not_analyzed" }

see here and there for reference.

What is a multi-token string?

1 Answers1