0

If i have 2 records into my SOLR indexing with same email address and my keyword search is getting both of them in the result set, how can i display one of them. For example;

Record 1:

<doc>
     <id>123</id>
     <name>Adil Malik</name>
     <email>abc@hotmail.com</email>
     <jobtitile>Software Engineer</jobtitle>
</doc>

Record 2:

<doc>
     <id>456</id>
     <name>Adil Malik</name>
     <email>abc@hotmail.com</email>
     <jobtitile>Database Developer</jobtitle>
</doc>

If we search with "abc@hotmail.com", it will return both records but i want to display any one of them. How can i query in the SOLR search to display only one record if we have 2 with same email address?

NOTE: i want to keep both records into my SOLR indexing.


In reply to @Layke

enter image description here

Adil Malik
  • 6,279
  • 7
  • 48
  • 77
  • I want to keep duplicate records in my indexing. Because if someone searching with Job Title: "Software Engineer" OR some search with the Job Title: "Database Developer" in both cases "Adil Malik" should return – Adil Malik Oct 30 '12 at 17:37
  • But if someone searching with common field like email: abc@hotmail.com, now SOLR search will return 2 records where id is 123 and 456. In that case i just want to display any one of them. – Adil Malik Oct 30 '12 at 17:42

2 Answers2

9

You should do some reading on FieldCollapsing and also on Deduplication (Deduplication prevents documents from entering the index at all, which isn't what you want, but I'll keep it here to help other readers where this might be suitable. ).

To use the FieldCollapsing, you would use your query and have group: true , group.field : email

However, looking at the document examples you provided, I would probably say that you have designed your schema wrong, and what you actually want to do it use Multi Values fields.

Read this question here it might explain/advise how you should have used MVF instead.

What is the use of "multiValued" field type in Solr?

Community
  • 1
  • 1
Layke
  • 51,422
  • 11
  • 85
  • 111
  • I agree, use a multiValued field type for your jobtitle field. – Paige Cook Oct 31 '12 at 02:44
  • About the Schema, basically when i was designing that the Multi Value was in my mind but i can not use that because of system requirement. This is really extensive system and too many things are involved those i cannot explain here. To handle all the things i did not take the Multi Value and i kept each record separately in my SOLR indexing. I believe "FieldCollapsing" is the exact thing that i required. But when i tried it is not working for me. I've attached the screenshot with my question body. Please take a look and see if you can help. Thanks a lot – Adil Malik Nov 01 '12 at 14:38
  • OK, fieldCollapsing fixed it. Actually I was on version 2 I had to upgrade to version 4 and then it worked :) Thanks – Adil Malik Nov 03 '12 at 15:20
1

How about using your email field as a unique key so no duplicate will be allowed ? Search for <uniqueKey> in the wiki page for schema.xml https://wiki.apache.org/solr/SchemaXml

Layke
  • 51,422
  • 11
  • 85
  • 111
Jérôme R
  • 1,227
  • 2
  • 13
  • 23