2

I have been having some difficulty with Lucene and would appreciate any help.

I have a custom query which is manually written and parsed (this query) using QueryParser.Parse. I am using version LUCENE_29 and the StandardAnalyzer.

In my query I have a special character (colon) and need this to remain:

+(Name:"test\:word" OR Business:"test\:word hello")

The output after parsing the query text above is:

+(Name:"test word" OR Business:"test word hello")

Does anyone have any suggestions, I tried passing an empty stop words collection to the StandardAnalyzer constructor but that has no effect it still strips out the colon.

Thank you.

Rajan Mishra
  • 1,178
  • 2
  • 14
  • 30
H S
  • 103
  • 13
  • 1
    You ask a good question. I had a similar problem with Lucene and found no way to resolve this issue. Lucene was retired on our website partly due to this issue. – JohnH Sep 12 '17 at 17:02
  • @JohnH thanks for sharing this info! – H S Sep 12 '17 at 17:04
  • FYI - `LUCENE_29` only tells us the version compatibility you have set, it doesn't tell us what lucene or lucene.net version you are using. – NightOwl888 Sep 13 '17 at 12:25

1 Answers1

1

You can't. StandardAnalyzer was specifically designed to remove special characters.

The answer is to use an Analyzer implementation that doesn't strip special characters (such as WhiteSpaceAnalyzer) or to build a custom analyzer based on existing tokenizers and filters to meet your needs.

Note that you would need to use WhiteSpaceAnalyzer to index your data with those special characters, otherwise they won't be available at query-time.

NightOwl888
  • 55,572
  • 24
  • 139
  • 212
  • Hi, I have used the WhiteSpaceAnalyer and when using query.parse it results in : +(Name:test:word Name:"test:word hello") - This query works but I don't understand why the WhiteSpaceAnalyzer is stripping the quotation marks from the name field but leaving them on the Business field. Any ideas? – H S Sep 13 '17 at 09:13
  • Have you used `WhiteSpaceAnalyzer` during index time? The analyzed data needs to be written to the index with the special characters otherwise they won't be available at query time. – NightOwl888 Sep 13 '17 at 12:23