There is a great feature of collective.solr
, that you can query solr using lucene query syntax from the plone search.
Query Parser Syntax:
--> https://lucene.apache.org/core/2_9_4/queryparsersyntax.html
collective solr has a simple test if it should mangle your search query using the settings in collective.solr, or if it passes it as a simple lucene query to solr.
The test is really simple, but the mangle code is hart to understand (at least for me):
simpleTerm = compile(r'^[\w\d]+$', UNICODE)
...
simpleCharacters = compile(r'^[\w\d\?\*\s]+$', UNICODE)
If you're term doesn't match, collective.solr assumes you're trying to do a query using simple lucene syntax and therefore it will show no result in your case.
I faces the same problem a few weeks ago and you got the following options:
- Simply add a dot - so collective.solr recognizes searching terms with dots not as lucene query.
- Prepare your search term before passing it to collective.solr.
First options is just a quick-win, because there will be someone, who will search for a term with a comma, semicolon, quotes, etc.
I personally customized the the search term before I passed it to the search.
Afaik the solr tokenizer also removes several not alphanumeric characters
This SO answer explains how the default tokenizer works
Splits words at punctuation characters, removing punctuations. However, a dot that's not followed by whitespace is considered part of a token. Splits words at hyphens, unless there's a number in the token. In that case, the whole token is interpreted as a product number and is not split. Recognizes email addresses and Internet hostnames as one token.
So it's up to you how you want to handle non alphanumeric terms :-)
The best solution if you never want to use lucene query syntax, would be to prepare the terms similar to the tokenizer.