4

In Luke, the following search expression returns 23 results:

docurl:www.siteurl.com  docfile:Tomatoes*

If I pass this same expression into my C# Lucene.NET app with the following implementation:

        IndexReader reader = IndexReader.Open(indexName);
        Searcher searcher = new IndexSearcher(reader);
        try
        {
            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            BooleanQuery bquery = new BooleanQuery();
            Query parsedQuery = parser.Parse(query);
            bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.MUST);
            int _max = searcher.MaxDoc();
            BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
            TopDocs hits = searcher.Search(parsedQuery, _max)
            ...
        }

I get 0 results

Luke is using StandardAnalyzer and this is what the Explain Structure window looks like: Luke Query Structure

Must I manually create BooleanClause objects for each field I search on, specifying Should for each one then add them to the BooleanQuery object with .Add()? I thought the QueryParser would do this for me. What am I missing?

Edit: Simplifying a tad, docfile:Tomatoes* returns 23 docs in Luke, yet 0 in my app. Per Gene's suggestion, I've changed from MUST to SHOULD:

            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            BooleanQuery bquery = new BooleanQuery();
            Query parsedQuery = parser.Parse(query);
            bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.SHOULD);
            int _max = searcher.MaxDoc();
            BooleanQuery.SetMaxClauseCount(Int32.MaxValue);
            TopDocs hits = searcher.Search(parsedQuery, _max);

parsedQuery is simply docfile:tomatoes*

Edit2:

I think I've finally gotten to the root problem:

            QueryParser parser = new QueryParser("docurl", new StandardAnalyzer());
            Query parsedQuery = parser.Parse(query);

In the second line, query is "docfile:Tomatoes*", but parsedQuery is {docfile:tomatoes*}. Notice the difference? Lower case 't' in the parsed query. I never noticed this before. If I change the value in the IDE to 'T', 23 results return.

I've verified that StandardAnalyzer is being used when indexing and reading the index. How do I force queryParser to keep the case of the value of query?

Edit3: Wow, how frustrating. According to the documentation, I can accomplish this with:

parser.setLowercaseExpandedTerms(false);

Whether terms of wildcard, prefix, fuzzy and range queries are to be automatically lower-cased or not. Default is true.

I won't argue whether that's a sensible default or not. I suppose SimpleAnalyzer should have been used to lowercase everything in and out of the index. The frustrating part is, at least with the version I'm using, Luke defaults the other way! At least I learned a bit more about Lucene.

Dzejms
  • 3,108
  • 2
  • 30
  • 40

2 Answers2

3

Using Occur.MUST is equivalent to using the + operator with the standard query parser. Thus you code is evaluating +docurl:www.siteurl.com +docfile:Tomatoes* rather than the expression you typed into Luke. To get that behavior, try Occur.SHOULD when adding your clauses.

Gene Golovchinsky
  • 6,101
  • 7
  • 53
  • 81
  • Changing my search expression in Luke to +docurl:www.toledoblade.com +docfile:Tomatoes* does indeed return 0 docs. However, changing my clause to SHOULD doesn't seem to have the reverse effect. bquery.Add(parsedQuery, Lucene.Net.Search.BooleanClause.Occur.SHOULD);. hits.totalHits is still 0. – Dzejms May 16 '11 at 12:17
  • try searching on `bquery' rather than on `parsedQuery`, and, as @ryan mentioned below, see what the class of `parsedQuery` is. Stepping through the code in the debugger is often helpful. – Gene Golovchinsky May 16 '11 at 15:54
1

QueryParser will indeed take a query like "docurl:www.siteurl.com docfile:Tomatoes*" and build a proper query out of it (boolean query, range query, etc.) depending on the query given (see query syntax).

Your first step should be to attach a debugger and inspect the value and type of parsedQuery.

Ryan Ische
  • 3,536
  • 3
  • 21
  • 20
  • parsedQuery = docurl:www.toledoblade.com docfile:tomatoes*. So it looks like the QueryParser is doing it's thing. – Dzejms May 16 '11 at 12:09
  • What is the type of parsedQuery? It should be a boolean query and be composed of two other queries (matching the query structure you see in Luke). Two questions: 1 - what it is the purpose of bquery (it doesn't seem to be used at all)? 2 - Do you get the same results if you use another different search method (one of the other overloads)? – Ryan Ische May 16 '11 at 13:14
  • You're right about bquery, I never noticed it isn't used. I didn't write this code. Looking into this now. – Dzejms May 16 '11 at 13:32
  • The reason for using `bquery` is to specify more than one search term, which is when the MUST/SHOULD distinction will come into play. – Gene Golovchinsky May 16 '11 at 15:50
  • `bquery` is not used as an argument to the `IndexSearcher` though. – Ryan Ische May 16 '11 at 17:42
  • Changing the last line to TopDocs hits = searcher.Search(bquery, _max); doesn't seem to have helped either. Still 0 hits. – Dzejms May 16 '11 at 19:05
  • Right, I don't think `bquery` is necessary here at all. Sorry, I can see how my last comment might have suggested that. The `QueryParser` will take your query expression (`docurl:www.toledoblade.com docfile:tomatoes*`) and turn it into the proper query structure (a `BooleanQuery` composed of a `TermQuery` and `PrefixQuery`). I would suggest digging into `parsedQuery` in the debugger and looking to see what it is composed of. There should be a queries collection/enumerable/array inside of `parsedQuery`. – Ryan Ische May 16 '11 at 19:18
  • Got it, I see both the TermQuery and PrefixQuery in the clauses collection. Both have an occur member that seems to be blank {} of type Lucene.Net.Search.BooleanClause.Occur. If I prefix both field names with '+' then the value for the occur variable for both is {+}, which based on Gene's comment seems to force MUST. Is there any way to add something in the query expression to force a SHOULD? If not, do you have any suggestions for how else I can default to SHOULD in my parsedQuery BooleanQuery object? – Dzejms May 16 '11 at 19:48
  • The default `BooleanClause` for `QueryParser` is SHOULD, so that's not likely the problem. Without more information, I can't really say why it's not working. All I can suggest is to try different queries, try an index with only a few documents in it. Just try to control the environment to so you can do some experiments. – Ryan Ische May 16 '11 at 20:41
  • You both helped, but Ryan gave more input. Points to him. – Dzejms May 17 '11 at 18:35