Lucene mentions that -
If The document you are indexing are very large. Lucene by default only indexes the first 10,000 terms of a document to avoid OutOfMemory errors
though we can configure it by IndexWriter.setMaxFieldLength(int).
I created an index in elasticsearch - http://localhost:9200/twitter and posted a document with 40,000 terms in it.
mapping -
{
"twitter": {
"mappings": {
"tweet": {
"properties": {
"filter": {
"properties": {
"term": {
"properties": {
"message": {
"type": "string"
}
}
}
}
},
"message": {
"type": "string",
"analyzer": "standard"
}
}
}
}
} }
i indexed a document with message field has 40,000 terms - message: "text1 text2 .... text40000" .
Since standard analyzer analyzes on space it has indexed 40,000 terms.
My point is Does elasticsearch sets a limit of number of indexed terms on lucene ? If yes what is that limit ?
If no, how my all 40,000 terms got indexed , it shouldn't have indexed terms more than 10000.