0

Lucene mentions that -

If The document you are indexing are very large. Lucene by default only indexes the first 10,000 terms of a document to avoid OutOfMemory errors

though we can configure it by IndexWriter.setMaxFieldLength(int).

I created an index in elasticsearch - http://localhost:9200/twitter and posted a document with 40,000 terms in it.

mapping -

{
    "twitter": {
        "mappings": {
            "tweet": {
                "properties": {
                    "filter": {
                        "properties": {
                            "term": {
                                "properties": {
                                    "message": {
                                        "type": "string"
                                    }
                                }
                            }
                        }
                    },
                    "message": {
                        "type": "string",
                        "analyzer": "standard"
                    }
                }
            }
        }
    } }

i indexed a document with message field has 40,000 terms - message: "text1 text2 .... text40000" .

Since standard analyzer analyzes on space it has indexed 40,000 terms.

My point is Does elasticsearch sets a limit of number of indexed terms on lucene ? If yes what is that limit ?

If no, how my all 40,000 terms got indexed , it shouldn't have indexed terms more than 10000.

Adon Smith
  • 1,849
  • 7
  • 19
  • 19

1 Answers1

0

The source you're citing doesn't seem up-to-date, as IndexWriter.setMaxFieldLength(int) was deprecated in Lucene 3.4 and now isn't available anymore in Lucene 4+, which ES is based on. It's been replaced by LimitTokenCountAnalyzer. However, I don't think such a limit exists anymore, or at least it is not set explicitly within the Elasticsearch codebase.

The only limit you might encounter while indexing documents would be related to either the HTTP payload size or Lucene's internal buffer size such as explained in this post

Community
  • 1
  • 1
Val
  • 207,596
  • 13
  • 358
  • 360