0

I'm trying to strip the micro (μ) unicode character from a string using perl regexs. Take the string

$string = "This is a micro μ and some more μμμ";

Using a brute force approach to remove all 'more specialised' unicode characters does the job, i.e.,

$string =~ s/[\x80-\xFF]+//g;

But the following that singles out the micro character does not work for me

$string =~ s/\xB5+//g;

Pretty sure 00B5 is the unicode for the micro sign. Any ideas where I'm going wrong?

TylerH
  • 20,799
  • 66
  • 75
  • 101
James B
  • 8,975
  • 13
  • 45
  • 83
  • 3
    Is your string properly recognised as Unicode by Perl? For example, what is `length("µ")`? It should be 1; if it's more than 1, then you should look at the [Encode](https://metacpan.org/pod/Encode) module to decode your UTF-8 byte string into a character string. Also be aware that the micro symbol (U+00B5) and the Greek small letter mu (U+03BC) look very similar, but are considered different characters. – tobyink Jul 04 '14 at 09:46
  • Yes, you're right. It's the greek letter and not the micro sign - d'oh! – James B Jul 04 '14 at 10:02
  • 1
    Please provide the output of `printf("U+%v04X\n", $string);` or `use Data::Dumper; local $Data::Dumper::Useqq = 1; print(Dumper($string));`? – ikegami Jul 04 '14 at 14:40

1 Answers1

3

This may not be the micro sign, check out the similar Greek small letter mu, as tobyink has suggested in his comment.

#!/usr/bin/perl
use strict;
use warnings;
use utf8;
my $string = "This is a micro μ and some more μμμ";
$string =~ s/\x{03BC}//g;
print $string;

Output: This is a micro and some more

Demo

References:

Community
  • 1
  • 1
Chankey Pathak
  • 21,187
  • 12
  • 85
  • 133
  • Great, works. Correct, think it is the greek letter. However, when I print the string using package `utf8` I get a warning `Wide character in print at ...`. I presume this is just me getting muddled with encodings. N.B. the strings are being read in from xml. – James B Jul 04 '14 at 10:01
  • Check this: [use utf8 gives me 'Wide character in print'](http://stackoverflow.com/questions/15210532/use-utf8-gives-me-wide-character-in-print) – Chankey Pathak Jul 04 '14 at 10:06