Background
Various languages have what's called "Diacritics" . Special signs that come with "normal" letters, one way or another. They might change how the letters sound, or just give a hint about how they are supposed to be sound.
The problem
When searching and sorting strings using the basic way, it uses the Unicode value of the characters, so things can seem to be in the wrong order for sorting, or not found for searching.
Searching should allow me to find the occurrences of a string within another, including not just that they exist, but also where.
If I take the string "Le Garçon" in French, for example, and search for "rc" it would find it on position of "r" and ends with the position of "ç". Finding the locations is important in case you wish to highlight where the text was found.
What I've found
Collator and CollationKey can help for sorting: https://stackoverflow.com/a/75334111/878126
Normalizer might help for searching as it replaces letters that have Diacritic: https://stackoverflow.com/a/10700023/878126
But, these don't seem to cover some languages. I know Hebrew for example, and in Hebrew, it has Niqqud (equivalent to Vowels in English but are optional) signs, which, as a Unicode characters, are added after the letter. That's even though the sign itself is shown inside/around the letter.
https://en.wikipedia.org/wiki/Diacritic#Hebrew
In this case, normalization of the word doesn't do anything, and so searching for the text and sorting it becomes a problem.
Example is:
val regex = Pattern.compile("[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+").toRegex()
val string = "בְּרֵאשִׁית"
val length = string.length // this is 11 and not 6 as it seems for other languages
val normalized = Normalizer.normalize(string, Normalizer.Form.NFD)
val result = normalized.replace(regex, "") // this still becomes the same exact value as on the original, instead of "בראשית"
I was told (here) that perhaps ICU4J library could help with these 2 operations (search and sort), but I can't find this information.
The questions
Is there a better solution in Java/Kotlin API to have searching&sorting while ignoring Diacritics? One that includes as many languages as possible?
Is it possible ICU4J can help? If so, how? I couldn't find much information and samples about how to use it for this purpose in Java/Kotlin.