12

I have an index named LocationIndex in solr with fields as follows:

<fields>
    <field name="solr_id" type="string" stored="true" required="true" indexed="true"/>
    <field name="solr_ver" type="string" stored="true" required="true" indexed="true" default="0000"/>
    // and some more fields
</fields>
<uniqueKey>solr_id</uniqueKey>

But now I want to change schema so that unique key must be composite of two already present fields solr_id and solr_ver... something as follows:

<fields>
    <field name="solr_id" type="string" stored="true" required="true" indexed="true"/>
    <field name="solr_ver" type="string" stored="true" required="true" indexed="true" default="0000"/>
    <field name="composite-id" type="string" stored="true" required="true" indexed="true"/>
    // and some more fields
</fields>
<uniqueKey>solr_ver-solr_id</uniqueKey>

After searching I found that it's possible by adding following to schema: (ref: Solr Composite Unique key from existing fields in schema)

<updateRequestProcessorChain name="composite-id">
  <processor class="solr.CloneFieldUpdateProcessorFactory">
    <str name="source">docid_s</str>
    <str name="source">userid_s</str>
    <str name="dest">id</str>
  </processor>
  <processor class="solr.ConcatFieldUpdateProcessorFactory">
    <str name="fieldName">id</str>
    <str name="delimiter">--</str>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

So I changed schema and finally it looks like:

<updateRequestProcessorChain name="composite-id">
  <processor class="solr.CloneFieldUpdateProcessorFactory">
    <str name="source">solr_ver</str>
    <str name="source">solr_id</str>
    <str name="dest">id</str>
  </processor>
  <processor class="solr.ConcatFieldUpdateProcessorFactory">
    <str name="fieldName">id</str>
    <str name="delimiter">-</str>
  </processor>
  <processor class="solr.LogUpdateProcessorFactory" />
  <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

<fields>
    <field name="solr_id" type="string" stored="true" required="true" indexed="true"/>
    <field name="solr_ver" type="string" stored="true" required="true" indexed="true" default="0000"/>
    <field name="id" type="string" stored="true" required="true" indexed="true"/>
    // and some more fields
</fields>
<uniqueKey>id</uniqueKey>

But while adding a document it's giving me error:

org.apache.solr.client.solrj.SolrServerException: Server at http://localhost:8983/solr/LocationIndex returned non ok status:400, message:Document [null] missing required field: id

I'm not getting what changes in schema are required to work as desired?

In a document I add, it contain fields solr_ver and solr_id. How and where it'll (solr) create id field by combining both these field something like solr_ver-solr_id?

EDIT:

At this link It's given how refer to this chain. Bu I'm unable to understand how would it be used in schema? And where should I make changes?

N D Thokare
  • 1,703
  • 6
  • 35
  • 57

3 Answers3

10

So it looks like you have your updateRequestProcessorChain defined appropriately and it should work. However, you need to add this to the solrconfig.xml file and not the schema.xml. The additional link you provided shows you how to modify your solrconfig.xml file and add your defined updateRequestProcessorChain to the current /update request handler for your solr instance.

So find do the following:

  1. Move your <updateRequestProcessorChain> to your solrconfig.xml file.
  2. Update the <requestHandler name="/update" class="solr.UpdateRequestHandler"> entry in your solrconfig.xml file and modify it so it looks like the following:

    <requestHandler name="/update" class="solr.UpdateRequestHandler">
       <lst name="defaults">
          <str name="update.chain">composite-id</str>
       </lst>
    </requestHandler>
    

This should then execute your defined update chain and populate the id field when new documents are added to the index.

Paige Cook
  • 22,415
  • 3
  • 57
  • 68
  • I updated as you said and hope that's correct.. But now I'm getting `class not found` error for `CloneFieldUpdateProcessorFactory`. Is this feature not available for older solr versions? I'm using solr whose specifications are: `Solr Specification Version: 3.4.0.2011.09.09.09.06.17`, `Solr Implementation Version: 3.4.0 1167142 - mike - 2011-09-09 09:06:17`. – N D Thokare Jul 24 '13 at 07:05
  • I just looked at the Solr source and unfortunately, the `CloneFieldUpdateProcessorFactory` is only available in Solr 4.x versions and is not included with the Solr 3.x versions. Sorry. – Paige Cook Jul 24 '13 at 15:33
  • I tried it and I am getting this error Document is missing mandatory uniqueKey field: composite-id. Do we have to define this composite-id in the document – Nipun Oct 13 '15 at 11:37
4

The described above solution may have some limitations, what if "dest" is over maximum length because concatenated fields are too long. There is also one more solution with MD5Signature (A class capable of generating a signature String from the concatenation of a group of specified document fields, 128 bit hash used for exact duplicate detection)

<!-- An example dedup update processor that creates the "id" field on the fly 
     based on the hash code of some other fields.  This example has 
     overwriteDupes set to false since we are using the id field as the 
     signatureField and Solr will maintain uniqueness based on that anyway. --> 
<updateRequestProcessorChain name="dedupe"> 
  <processor class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory"> 
    <bool name="enabled">true</bool> 
    <bool name="overwriteDupes">false</bool> 
    <str name="signatureField">id</str> 
    <str name="fields">name,features,cat</str> 
    <str name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str> 
  </processor> 
  <processor class="solr.LogUpdateProcessorFactory" /> 
  <processor class="solr.RunUpdateProcessorFactory" /> 
</updateRequestProcessorChain> 

From here: http://lucene.472066.n3.nabble.com/Solr-duplicates-detection-td506230.html

Maksim
  • 1,231
  • 1
  • 15
  • 25
  • I tried this solution and still it gives me Document is missing mandatory uniqueKey "id" – Nipun Oct 14 '15 at 05:38
2

I'd like to add this as a comment, but it's impossible to get the creds these days... anyway, here is a better link: https://wiki.apache.org/solr/Deduplication

Dan
  • 365
  • 1
  • 3
  • 10