Lucene does not by default allow leading wildcards in search terms, but this can be enabled with:
QueryParser#setAllowLeadingWildcard(true)
I understand that use of a leading wildcard prevents Lucene from using the index. Searches with a leading wildcard must scan the entire index.
How do I demonstrate the performance of a leading wildcard query? When is it OK to use setAllowLeadingWildcard(true)
?
I have built a test index with 10 million documents in the form:
{ name: random_3_word_phrase }
The index is 360M on disk.
My test queries perform well and I have been unable to actually demonstrate a performance problem. For example, querying for name:*ing
produces over 1.1 million documents in less than 1 second. Querying name:*ing*
produces over 1.5 million documents in the same time.
What is going here? Why isn't this slow? Is 10,000,000 documents not enough? Do the documents need to contains more than just a single field?