0

I have two field types defined into a solr schema.xml as follows:

<fieldType name="string" class="solr.StrField" sortMissingLast="true"/>

<fieldType name="myTextField" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory" />
      <filter class="solr.ASCIIFoldingFilterFactory"/>
      <filter class="solr.LengthFilterFactory" min="4" max="255" />
  </analyzer>
</fieldType>

I'm using them into two fields, as follows:

<field name="exactName" type="string" multiValued="false" indexed="true" required="true" stored="true"/>

<field name="processedName" type="myTextField" indexed="true" stored="true" multiValued="true" />

And finally I defined a handler:

<requestHandler name="/nameSearch" class="solr.SearchHandler" default="true">
  <lst name="defaults">
    <str name="echoParams">explicit</str>
    <str name="defType">edismax</str>
    <str name="qf">exactName^100 processedName^10</str>
    <str name="q.alt">*:*</str>
    <str name="rows">1</str>
    <str name="fl">*,score</str>
  </lst>
</requestHandler>

What I'm trying to achieve is something like what is described here and here. "Perfect" matches in exactName field should be scored higher that other fields.

The problem is, when debuging, I can see that my handler is not correctly managing the exactName field in searchs. It produces a query like this one:

+((processedName:bob | exactName:bob) (processedName:rivers | exactName:rivers))

Since the exactName isn't tokenized, search for tokens in it is useless.

If I change my handler by adding:

<str name="pf">exactName^1</str>
<str name="ps">1</str>

Solr seems to ignore it (probably pf requires a multiValued field). The resulting query is the same one. If I change the handler to

<str name="qf">processedName</str>
<str name="pf">processedName^10</str>
<str name="ps">1</str>

It changes the query as follows:

+((processedName:bob) (processedName:rivers)) (processedName:\"bob rivers\"~1^10.0)

The query is right and seems to me that this exactName strategy is not valid...

The strategy sounded interesting, and matched my needs: a document from "bob rivers" should be scored higher than one from "bob somethig rivers" (notice that I don't want to supress it, just to boost the exact name score).

Is it possible to do something like this?

Community
  • 1
  • 1
Bob Rivers
  • 5,261
  • 6
  • 47
  • 59

1 Answers1

1

Not with the dismax query handler as standard, since that will split the query string into multiple separate queries that are joined as separate query terms.

I'd try to use the edismax query handler instead, as that allows lucene query syntax combined with the syntax from dismax. That way you can search in exactName for the exact term and let edismax use a dismax expansion query for the rest of the search.

Something like:

defType=edismax&q=exactName:"Bob Rivers"^100 bob rivers&q.op=OR

.. could work (might need a bit of tweaking).

But as you've discovered, you might get the same result by just using phrase field boosts for the processedName field, as that will evaluate to something similar.

Another tweak would be to have two tokenized fields (processedName and exactishName), where the latter could have less processing (for example no stemming or phonetic, etc.), and then be given a higher score in qf and pf.

For edismax there's also pf2 and pf3, which allows you to relax the requirement from pf to have all terms present in order (to just two or three respectively).

MatsLindh
  • 49,529
  • 4
  • 53
  • 84