4

I need to regex-search string for occurences with these conditions:

  • on word boundary
  • case-insensitive
  • ignore diacritics

My code:

CharSequence text = "One Twó";
String searchString = "two";
Pattern p = Pattern.compile("(?i)\\b"+searchString);
Matcher m = p.matcher(text);
while(m.find()) {
   int s = m.start();
   int e = m.end();
}

The first 2 conditions are achieved by the (?i) and \b pattern expressions.

I still need to achieve 3rd goal, ignoring diacritics, so in above the searched string "two" would match text "Twó" in the text. How can this be accomplished?

M. Justin
  • 14,487
  • 7
  • 91
  • 130
Pointer Null
  • 39,597
  • 13
  • 90
  • 111

1 Answers1

1

I don't have a perfect regex based solution. Maybe it exists, maybe it doesn't.

A suggestion for a workaround though: You could try to remove the diacritics before you try to match the string.

Related question:

aioobe
  • 413,195
  • 112
  • 811
  • 826