0

I want to search in a Solr database on full names. The documents in the database are from different sources, so the spelling of the name in the documents is not consistent. The spelling can be firstname lastname or lastname firstname. Also there can be one or more firstnames and one or more lastnames.

So if a name is: firstname: ALBERTO JORGE lastname: ALONSO CALEFACCION

The spellings can be: ALBERTO JORGE ALONSO CALEFACCION

ALBERTO J. ALONSO CALEFACCION

ALBERTO J ALONSO CALEFACCION

ALBERTO ALONSO CALEFACCION

and ALONSO CALEFACCION ALBERTO JORGE

ALONSO CALEFACCION ALBERTO J.

ALONSO CALEFACCION ALBERTO J

ALONSO CALEFACCION ALBERTO

I can search on the last names only with "ALONSO CALEFACCION"~0 with correct responses.

But how to search on all different spellings in one match? The search will be created by a program based on user input.

The search is more complicated because Spanish names can contain extra words like "y" and "de" without these words are required (in our case). So the name in the database could be something like: ALBERTO JORGE ALONSO Y CALEFACCION

Thanks for your help.

I use Solr 3.6

PROMES
  • 11
  • 2
  • possible duplicate of [Searching names with Apache Solr](http://stackoverflow.com/questions/5516503/searching-names-with-apache-solr) – Max Charas Sep 13 '13 at 11:15

1 Answers1

0

If you saved the first name in firstname and the last name in lastname fields you can prepare your query in some programming language. For example, if the user typed 2 words, you can query firstname:(word1) AND lastname:(word2) OR firstname:(word2) AND lastname:(word1).

You can even make a special type for these fields to find initial and contracted forms:

<fieldType name="AuthorsPrefix" class="solr.TextField"  positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="200" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>

You can read more here.

Another approach is to generate all possible combinations during indexing and search for authors in this combo field:

ALBERTO JORGE ALONSO CALEFACCION
ALBERTO J ALONSO CALEFACCION
ALBERTO ALONSO CALEFACCION
ALONSO CALEFACCION ALBERTO JORGE
ALONSO CALEFACCION ALBERTO J
ALONSO CALEFACCION ALBERTO

You can generate the synonyms automaticall making your own SearchComponent.

Artem Lukanin
  • 556
  • 3
  • 15
  • Thanks Max and Artem for your links. These don't give a full solution (can there be a 100% solution?) but enough to work with. My conclusion: I can get 75% ok results from Solr and have to process the results to get better results with regex in PHP as in: [link:](http://stackoverflow.com/questions/18773711/how-check-different-spellings-of-a-persons-full-name) – PROMES Sep 13 '13 at 14:34
  • 100% approach is to list all possible names in the index – Artem Lukanin Sep 16 '13 at 12:50