6
string q = "m";
Query query = new QueryParser("company", new StandardAnalyzer()).Parse(q+"*");

will result in query being a prefixQuery :company:a*

Still I will get results like "Fleet Africa" where it is rather obvious that the A is not at the start and thus gives me undesired results.

Query query = new TermQuery(new Term("company", q+"*"));

will result in query being a termQuery :company:a* and not returning any results. Probably because it interprets the query as an exact match and none of my values are the "a*" literal.

Query query = new WildcardQuery(new Term("company", q+"*"));

will return the same results as the prefixquery;

What am I doing wrong?

Boris Callens
  • 90,659
  • 85
  • 207
  • 305

3 Answers3

6

StandardAnalyzer will tokenize "Fleet Africa" into "fleet" and "africa". Your a* search will match the later term.

If you want to consider "Fleet Africa" as one single term, use an analyzer that does not break up your string on whitespaces. KeywordAnalyzer is an example, but you may still want to lowercase your data so queries are case insensitive.

sisve
  • 19,501
  • 3
  • 53
  • 95
0

The short answer: all your queries do not constrain the search to the start of the field. You need an EdgeNGramTokenFilter or something like it. See this question for an implementation of autocomplete in Lucene.

Community
  • 1
  • 1
Yuval F
  • 20,565
  • 5
  • 44
  • 69
  • Surely the example is too farfeched, right? Isn't it possible to create a startswith like query without all the fuzz? – Boris Callens Mar 03 '09 at 11:11
  • Not that I know of. startswith is tricky. If you manage to do this, please let me know. From what I see, PrefixQuery means looking for the start of any term, not just the first. – Yuval F Mar 03 '09 at 11:44
  • This surprises me actually. Startswith must be the most easy query to do, not? – Boris Callens Mar 03 '09 at 12:22
  • I have exactly the opposite problem, for me Lucene performs `StartsWith` by default, but I want a `Contains`and I don't know how to achieve this. What Version/Analyzer are you using? I'm using 2.9/StandardAnalyzer. Also my question is located at: http://stackoverflow.com/questions/5484965/howto-perform-a-contains-search-rather-than-starts-with-using-lucene-net – ntziolis Mar 30 '11 at 10:34
0

Another solution could be to use StringField to store the data for ex: "Fleet Africa" Then use a WildCardQuery.. Now f* or F* would give results but A* or a* won't.

StringField is indexed but not tokenized.

GPuri
  • 495
  • 4
  • 11