2

I am aware that the Lucene documentation says

Note: The NOT operator cannot be used with just one term. For example, the following search will return no results:

NOT "jakarta apache"

However, I would like to be able to form a query that returns all documents NOT containing a term. I have looked into stringing together a MatchAllDocsQuery and a TermQuery into a BooleanQuery, but I cannot seem to find the right combination.

If I index the following two documents

Doc0: content:The quick brown fox jumps over the lazy dog.
Doc1: (empty string)

The query *:* -content:fox returns both documents when I just want one document.

The RegexQuery content:^((?!fox).)*$ suggested by this StackOverflow answer returns one document but it does not seem to be working correctly because content:^((?!foo).)*$ returns one document as well when I expect it to return two documents.

I am aware of the performance implications of what I want to do. The query will only be run on a few documents so I am not too worried about performance.

Is there a way to write a Lucene query to get what I want?

Community
  • 1
  • 1
BennyMcBenBen
  • 1,438
  • 2
  • 20
  • 37

2 Answers2

5

You can use match everything and exclude the term -

IndexSearcher searcher = new IndexSearcher("path_to_index");
MatchAllDocsQuery everyDocClause = new MatchAllDocsQuery();
TermQuery termClause = new TermQuery(new Term("text", "exclude_term"));
BooleanQuery query = new BooleanQuery();
query.add(everyDocClause, BooleanClause.Occur.MUST);
query.add(termClause, BooleanClause.Occur.MUST_NOT);
Hits hits = searcher.search(query);  

Else, have a dummy field which some fixed value and use query

+dummy_field:dummy_value -exclude_term
Jayendra
  • 52,349
  • 4
  • 80
  • 90
  • Both of your answers work for me. MatchAllDocsQuery is preferred. At first I implemented MatchAllDocsQuery and it did not work so I asked this Question. When I got the dummy field working, I switched back to MatchAllDocsQuery for a sanity check and it worked. I am not sure why it did not work for me before. I must have gotten one of the steps wrong. – BennyMcBenBen Nov 08 '11 at 16:06
1

Can't you append an "artificial" token to each document and then search for "'added token' and not 'what you want to avoid'" ?

Tudor
  • 61,523
  • 12
  • 102
  • 142