1

I'm using FTS4 in my android application to implement full-text search. The data in the app, coming from an API, has diacritics and accents. I've created 2 columns in the database, one which stores the original data and the other column stores data without diacritics or accents (stripped using Normalizer). The search gets executed successfully when I search for words without diacritics or accents. The problem arises when I want to highlight the searched query found in the text.

So for eg. this sentence which I got from SO:

James asked, “’Tis Renée’s and Noël’s great‐grandparents’ 1970's-ish summer‐house, t'isn’t it?” Receiving no answer, he shook his head--and walked away.

If I run a search for Renee, it will highlight Renée but when I execute a search for Renees, it successfully finds text which contain the word Renée’s but because of the apostrophe it will not highlight it.

    Search Term: Renee
    Highlighted Output: Renée
    
    Search Term: Renees
    Highlighted Output: <whitespace>Renée’ <-- doesn't show the expected output
    Expected Output: Renée’s

If I use replaceAll to remove all the apostrophes to highlight the searched query, it will show the highlighted word Renée’s but only till the apostrophe like so -> Renée’ highlighting even the whitespace before the word. But it pushes highlighted word back even more if there are more apostrophes in the paragraph which have been stripped.

Basically I want to show Renée’s in the paragraph displayed to the user and highlight the whole word even if the user searches for Renees.

Here's my code to highlight searched text:

 if (searchQuery != null){
                String paragraph = data.getParagraph();
                SpannableStringBuilder sb = new SpannableStringBuilder(paragraph);

                String normalizedText = Normalizer.normalize(paragraph, Normalizer.Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "").toLowerCase();

                //String normalizedText = Normalizer.normalize(paragraph, Normalizer.Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "").replaceAll("'", "").toLowerCase(); //remove all apostrophes -- this works but pushes back the highlighted text color because it doesn't count all stripped apostrophes in the original paragraph.


                Pattern word = Pattern.compile(searchQuery, Pattern.CASE_INSENSITIVE);
                Matcher match = word.matcher(normalizedText);

                while (match.find()) {
                    BackgroundColorSpan fcs = new BackgroundColorSpan(Color.YELLOW); 
                    sb.setSpan(fcs, match.start(), match.end(), Spannable.SPAN_EXCLUSIVE_EXCLUSIVE);
                }
                text.setText(sb);
            }

How do I highlight the searched word even with apostrophe?

VLAZ
  • 26,331
  • 9
  • 49
  • 67
input
  • 7,503
  • 25
  • 93
  • 150

1 Answers1

1

You can add ['’]? pattern (that matches an optional ' or char) between each char in the searchQuery:

Pattern word = Pattern.compile(TextUtils.join("['’]?", searchQuery.split("")), Pattern.CASE_INSENSITIVE);

This way, you will make sure the search phrase will match even if there is a single apostrophe anywhere inside it.

See a regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563