1

I'm implementing a filter for russian swearwords and my head is gonna explode. Is there any regexp for matching all cyrillic characters in a word except the first and the last one? \\B\\w\\B doesn't work on cyrillic alphabet. RegExp will be used in replaceAll(). [ёа-я]+ is good for matching russian words, but I still need to exclude first and last character.

Test Input: ОченьПлохоеСлово

Output: О**************о

  • 1
    [See also](https://stackoverflow.com/questions/1716609). Java [appears to support this syntax](https://docs.oracle.com/javase//tutorial/essential/regex/unicode.html). – Karl Knechtel Apr 01 '23 at 20:47
  • Why don't you get a list of Russian swearwords then create a ternary trie ? It is not possible to accomplish a ban on swearwords without it. Make a list of all possible permutations, then give me the list to make a full blown trie. Or you can make it yourself. – sln Apr 02 '23 at 00:35

1 Answers1

2

You can use this expression:

(?<=[а-яА-ЯёЁ])[а-яА-ЯёЁ](?=[а-яА-ЯёЁ])

with substitution *, to convert ОченьПлохоеСлово и ещё, but this is good word -> О**************о и е*ё, but this is good word.

Here:

  • (?<=[а-яА-ЯёЁ]) matches if main pattern is preceded by Cyrillic letter,
  • [а-яА-ЯёЁ] - main pattern: any Cyrillic letter
  • (?=[а-яА-ЯёЁ]) matches if main pattern is followed by Cyrillic letter.

Very simple example of matching on regex101.

markalex
  • 8,623
  • 2
  • 7
  • 32
  • 1
    We can also enable [`Pattern.UNICODE_CHARACTER_CLASS`](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CHARACTER_CLASS) via `(?U)` like `System.out.println("ОченьПлохоеСлово".replaceAll("(?U)\\B\\w\\B","*"));` – Pshemo Apr 01 '23 at 21:24
  • 1
    @Pshemo, yes we can, but it will replace any other letters in word to. But my understanding is that author wants to replace only Cyrillic letters. – markalex Apr 01 '23 at 21:45
  • 1
    True, but in case word which OP wants to censor contains only Cyrillic letters the solution with Unicode flag may also be nice to know about (which was point of my comment, I never claimed that your answer was in any way wrong). – Pshemo Apr 02 '23 at 07:24