0

I'm trying to find a way to match user search queries with a database records in a search engine, using Spring, but I'm having trouble when the search query includes special characters such as vowels with accent.

Eg: search query = 'cafe'. Database record = 'café'

I'm using the stem of words to the query with the database records.

Which would be the most straight forward way of matching the query including a special character 'café' with the string that doesn't contain this special character 'cafe' and viceversa?

UPDATE

All the information I need is already cached so the approach of creating a new column in the db is not so appealing. I'm looking for a solution more spring based.

Ayesha
  • 37
  • 1
  • 11

1 Answers1

5

You could use java.text.Normalizer, like follow:

import java.text.Normalizer;
import java.text.Normalizer.Form;

public static String removeAccents(String text) {
    return text == null ? null :
        Normalizer.normalize(text, Form.NFD)
            .replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}

The Normalizer splits the original characters into a set of two character (letter and accent). For example the character (U+00E1) will be split in a (U+0061) and acute accent U+0301

The \p{InCombiningDiacriticalMarks}+ regular expression will match all such diacritic codes and we will replace them with an empty string.

And your query could be like:

SQL SERVER

SELECT * FROM Table
WHERE Column Like '%stringwithoutaccents%' COLLATE Latin1_general_CI_AI

ORACLE (from 10g)

SELECT * FROM Table
WHERE  NLSSORT(Column, 'NLS_SORT = Latin_AI')
Like NLSSORT('%stringwithoutaccents%', 'NLS_SORT = Latin_AI')

The CI stands for "Case Insensitive" and AI for "Accent Insensitive".

I hope it helps you.

Alessandro
  • 4,382
  • 8
  • 36
  • 70