3

Okay, maybe something wrong with unicode or etc, but the code tells everything:

$ cat leo
сказывать
ссказываю
сказав
BladeMight@Chandere ~ 23:24:58
$ cat leo | perl -pe 's/^с+каз/Рассказ/g'
Рассказывать
ссказываю
Рассказав
BladeMight@Chandere ~ 23:25:00
$ cat leo | sed -r 's/^с+каз/Рассказ/g'
Рассказывать
Рассказываю
Рассказав

I have file leo, contents in cyrillic, so i wanted to replace wrong places with the regex ^с+каз in perl -pe, but it replaces only the ones that have only 1 с(cyrillic one), e.g. + does nothing in this case(and for non-cyrillic it works fine), although in sed -r it works perfectly. Why could that be?

BladeMight
  • 2,670
  • 2
  • 21
  • 35

1 Answers1

4

Perl needs to be told that your source code is UTF-8 (-Mutf8) and that it should treat stdin and stdout as UTF-8 (-CS).

$ cat leo | perl -Mutf8 -CS -pe 's/^с+каз/Рассказ/g'
Рассказывать
Рассказываю
Рассказав
hobbs
  • 223,387
  • 19
  • 210
  • 288
  • 1
    NOTE: `use utf8` is necessary only if inside of the code used `utf8` encoding (for example search pattern in this particular case). An options `-CS` is required practically anytime when `utf8` input/output takes place. – Polar Bear Nov 30 '19 at 00:11