1

I am trying to remove accents from my text by running on it a perl script, in which I use the tr operator (the simpler method I found):

I tried:

tr/àâäéèëêîïôöûùüç/aaaeeeeiioouuuc/;

It removes accents, but I get the character 'aa' instead of 'a', 'ae' instead of 'e', etc..

Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
Mostafa
  • 1,501
  • 3
  • 21
  • 37

1 Answers1

7

Better to use a proper module like Text::Undiacritic =)

#!/usr/bin/perl
use warnings;
use strict;
use utf8;
binmode(STDIN, ":utf8");
binmode(STDOUT, ":utf8");
binmode(STDERR, ":utf8");

use Text::Undiacritic qw(undiacritic);

my $string = "C'est l'été à Paris ?\n";
print undiacritic $string;

OUTPUT:

C'est l'ete a Paris ?

NOTE

As far as you asked string with accent, undiacritic() will work removing accents, but will not work by example on typographic ligature. If you pass the string

C'est l'été à Paris Lætitia ?

it will not substitute æ

Welcome in the real tricky world: Unicode-UTF8. A good pointer

Community
  • 1
  • 1
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223