2

I'm slowly fine-tuning my sieve filter. I noticed I was getting a lot of spam in Russian, so I thought I could filter on the presence of Cyrillic in the subject. I thought maybe three consecutive characters would be a good test, and it seems to work pretty well. Here's the line:

elsif header :regex "Subject" [ "[а-яА-Я]{3,}" ]

It's not ideal, because there are plenty of Cyrillic characters outside the А-Я range. Also, I'd like to do the same with CJK characters, and I'm not sure even how to begin with those.

Is it possible in sieve to specify a script as a character class? I've done it before in other regex implementations, but it seems to me that it's handled differently, if at all, by different regex flavours.

Thanks, Ben

1 Answers1

1

You can use

[\p{Cyrillic}\p{Han}]{3}

Details:

  • [ - start of a character class
    • \p{Cyrillic} - any Cyrillic char
    • \p{Han} - any Chinese char
  • ]{3} - end of the character class, three repetitions.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563