I'm migrating from lucene 3.0.1 to 4.1.0. After few days of analysis I suppose there is a differenc in filtering of queries result in these versions. After migration I see difference in query result for the same queries and filters.
The thing looks as follows:
I was using lucene 3.0.1 but for example StandardAnalyzer for IndexWriter was configured in this way:
new StandardAnalyzer(Version.LUCENE_24)
The same configuration was used for QueryParser. There are few Fields that are NOT_ANALYSED (means not indexed; is deprecated in 4.x) and this cause the problem after migration to 4.0.0 or 4.1.0. The problem is that values of some Fileds that are NOT_ANALYZED are UPPER CASE. The search process looks as as follows:
- QueryParser get Field (Document has many valuse for the same Field, that are most important information for users) and keyword
- Filters with additional user criteria are prepared QueryWrapperFilter(TermQuery(...))
- I override getDocIdSet from org.apache.lucene.search.Filter and iterate over all prepared Filters calling filter.getDocIdSet(IndexReader) and collect filtered elements .
I have found this ansewer regarding case sensitivity. I know that LowerCaseFilter is used in lucene 2.4 What I did is I re-built the index with 4.x but all NOT_ANALYZED values are now lower-case. Then the problem disapeard.
What could be the reason that for my solution using 3.0.3 case sensitivity "does not matter" and in 4.x "it matters". Maybe some of you could explain me what is happening under the hood.