0

I'm migrating from lucene 3.0.1 to 4.1.0. After few days of analysis I suppose there is a differenc in filtering of queries result in these versions. After migration I see difference in query result for the same queries and filters.

The thing looks as follows:

I was using lucene 3.0.1 but for example StandardAnalyzer for IndexWriter was configured in this way:

new StandardAnalyzer(Version.LUCENE_24)

The same configuration was used for QueryParser. There are few Fields that are NOT_ANALYSED (means not indexed; is deprecated in 4.x) and this cause the problem after migration to 4.0.0 or 4.1.0. The problem is that values of some Fileds that are NOT_ANALYZED are UPPER CASE. The search process looks as as follows:

  1. QueryParser get Field (Document has many valuse for the same Field, that are most important information for users) and keyword
  2. Filters with additional user criteria are prepared QueryWrapperFilter(TermQuery(...))
  3. I override getDocIdSet from org.apache.lucene.search.Filter and iterate over all prepared Filters calling filter.getDocIdSet(IndexReader) and collect filtered elements .

I have found this ansewer regarding case sensitivity. I know that LowerCaseFilter is used in lucene 2.4 What I did is I re-built the index with 4.x but all NOT_ANALYZED values are now lower-case. Then the problem disapeard.

What could be the reason that for my solution using 3.0.3 case sensitivity "does not matter" and in 4.x "it matters". Maybe some of you could explain me what is happening under the hood.

Community
  • 1
  • 1
Marcin Sanecki
  • 1,324
  • 3
  • 19
  • 35
  • I may be misunderstanding, but it sounds like you are saying it should never have worked before, and now it actually doesn't, just like you'dd expect. As far as why in the world it used to work, your description of your process really doesn't help me there. I don't know what you are trying to say you're doing with QueryParser, but it will certainly apply a lowercase filter in parsing, and I don't know where you are dealing with unanalyzed fields. – femtoRgon Feb 05 '13 at 17:11
  • I tried to describe the whole situation but it seems I failed. The problem was so strange for me that I decide to repeate the whole migration from the beginning. The result is: no case-sensitivity problem by filtering more. I don't know wat to write now... I have check all steps I did during the previous migration, all was the same. Maybe some old version of *.class or *.jar was on the server. Hard to say. – Marcin Sanecki Feb 06 '13 at 10:58

1 Answers1

0

Indexing and analyzing are two different things.

Analyzing means the field is put through the Analyzer of choice. Fields that are not analyzes are put in the index just the way they are.

If you index an uppercase string, without analyzing, it will stay uppercase in the index and will not be found using a lowercase query.

Rob Audenaerde
  • 19,195
  • 10
  • 76
  • 121
  • I agree, analysing is done during indexing and during parsing a query. But what happens when the query result is filtered? Is filtering case-sensitive? I don't see any connection between Filter and Analyser. – Marcin Sanecki Feb 05 '13 at 14:43
  • The filter that you use, QueryWrapperFilter uses a query that should behave exactly like a normal query does, i.e. when filtering on a non-analyzed field, it would be case sensitive, when using on an analyzed field, it depends on the analyzer – Rob Audenaerde Feb 05 '13 at 20:03