0

We had a bug in a library, that was caused by one of the inputs being Unicode.

It was fixed by adding use utf8; to the script using that library.

However, adding use utf8; to the library itself (so ALL scripts using that library would be fixed) had no effect.

Why? Can this be addressed?

ikegami
  • 367,544
  • 15
  • 269
  • 518
DVK
  • 126,886
  • 32
  • 213
  • 327

2 Answers2

3

From the documentation:

The use utf8 pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope.

In other words, this pragma applies to the current package only. You need to put it in every package whose source code might contain Unicode characters. If your input comes from somewhere else, then you need to ensure that it is properly decoded: the pragma will have no effect on that.

PS: I understand that you meant use utf8, not use utf-8 (the latter is not a valid pragma).

GMB
  • 216,147
  • 25
  • 84
  • 135
  • More specifically, it only applies to source code, so it needs to go in the file with the UTF-8 source code. If your input comes from somewhere else, like a file, STDIN, database, then you will need to ensure it is properly decoded in the manner appropriate to that medium; `use utf8` will have no effect on that. – Grinnz Apr 04 '19 at 20:47
  • @Grinnz: yes it is sure worth mentioning this, I updated my answer accordingly. Thanks! – GMB Apr 04 '19 at 20:50
  • Thank you. Ironically, my main problem was that our minimal test was actually testing wrong thing (by hard coding utf test string as part of script), this answer explained what the issue was AND that our original solution was indeed solving the wrong problem (we needed use open and perl 5.12 upgrade instead) – DVK Apr 04 '19 at 21:30
3

use utf8; tells Perl that the current file is encoded using UTF-8.

You have a script that's encoded using UTF-8, so you had to add use utf8; to the script. (Without it, you might think you have my $x = "é";, but you're telling Perl my $x = "é";.)

Adding it to a module makes no sense if it's the script that's encoded using UTF-8. The directive must be added to each file (script or module) that's encoded using UTF-8. (If you pass the bad $x to a module, and the module produces junk because of that, it's still the script that needs to be fixed.)

ysth
  • 96,171
  • 6
  • 121
  • 214
ikegami
  • 367,544
  • 15
  • 269
  • 518