4

I'm currently running into one issue over and over again. I am using the Collective Solr 4.1.0 Search on our Plone 4.2.6 system. Currently, when omitting a search, it works fine as long as there is no wildcard symbol in the search box. So for example Prof Dr Mathew Rogers works just fine and returns good results, such as a Person 'Prof. Dr. Mathew Rogers'.

When I omit the search Prof. Dr. Mathew Rogers Solr won't return any results.

I checked all other questions on this platform regarding this or close problems, but none was answered properly. Do any of you have an idea why the Solr query process breaks when I search for something containing a, for example, dot? Help would be greatly appreciated!

artemis_clyde
  • 373
  • 2
  • 20

1 Answers1

2

There is a great feature of collective.solr, that you can query solr using lucene query syntax from the plone search.

Query Parser Syntax: --> https://lucene.apache.org/core/2_9_4/queryparsersyntax.html

collective solr has a simple test if it should mangle your search query using the settings in collective.solr, or if it passes it as a simple lucene query to solr.

The test is really simple, but the mangle code is hart to understand (at least for me):

simpleTerm = compile(r'^[\w\d]+$', UNICODE)

...

simpleCharacters = compile(r'^[\w\d\?\*\s]+$', UNICODE)

If you're term doesn't match, collective.solr assumes you're trying to do a query using simple lucene syntax and therefore it will show no result in your case.

I faces the same problem a few weeks ago and you got the following options:

  1. Simply add a dot - so collective.solr recognizes searching terms with dots not as lucene query.
  2. Prepare your search term before passing it to collective.solr.

First options is just a quick-win, because there will be someone, who will search for a term with a comma, semicolon, quotes, etc.

I personally customized the the search term before I passed it to the search.

Afaik the solr tokenizer also removes several not alphanumeric characters

This SO answer explains how the default tokenizer works

Splits words at punctuation characters, removing punctuations. However, a dot that's not followed by whitespace is considered part of a token. Splits words at hyphens, unless there's a number in the token. In that case, the whole token is interpreted as a product number and is not split. Recognizes email addresses and Internet hostnames as one token.

So it's up to you how you want to handle non alphanumeric terms :-)

The best solution if you never want to use lucene query syntax, would be to prepare the terms similar to the tokenizer.

Community
  • 1
  • 1
Mathias
  • 6,777
  • 2
  • 20
  • 32
  • Thank you for your answer! I just realized that for one field I had the wrong `field_type` which had no tokenizer. I fixed that, still queries give no reply. Yet on another server where all fields are tokenized it works. Is this because the indexed value also has to be tokenized? I __didn't yet reindex__ on my new server. But I thought only altering the schema.xml would work since I'm only concerned about the query. – artemis_clyde Oct 28 '16 at 11:12
  • You need to prepare the query devlivered to solr in advance, like in the `SearchViewlet`. Since solr doesn't search explicitly for an `'`, you can remove it before querying solr. A regex or simple repllace would do the job in your case. – Mathias Oct 28 '16 at 12:01