If I have
$t = '20110512102331';
and would like only the first 4 characters from $t
.
How do I do that?
If I have
$t = '20110512102331';
and would like only the first 4 characters from $t
.
How do I do that?
By using the substr function like this -
my $t = "20110512102331";
my $four = substr($t, 0, 4)
For your particular problem, extracting what appears to be the year from a string, the substr
accidentally works but is really the wrong answer here. It's idea of "character" is not our idea of "character". Notice how the different normalized forms of résumé
produces different results. You probably want the first four graphemes, which you can match with \X
(although in ASCII a grapheme and character give the same result)
use v5.10.1;
use utf8;
use strict;
use warnings;
use Unicode::Normalize qw(NFD NFC);
my $string = '20110512102331';
say "$string → ", substr $string, 0, 4;
my $ustring = NFD( 'résumé' );
say "NFD $ustring → ", substr $ustring, 0, 4;
$ustring = NFC( 'résumé' );
say "NFC $ustring → ", substr $ustring, 0, 4;
$ustring = NFD( 'résumé' );
say "\\X with NFD $ustring → ", $ustring =~ m/(\X{4})/;
$ustring = NFC( 'résumé' );
say "\\X with NFC $ustring → ", $ustring =~ m/(\X{4})/;
Notice the NFD result is different:
$ perl -C substr.pl
20110512102331 → 2011
NFD résumé → rés
NFC résumé → résu
\X with NFD résumé → résu
\X with NFC résumé → résu
However, substr
does have some Unicode kung fu if you give it a string from Unicode::GCString:
use v5.10.1;
use utf8;
use strict;
use warnings;
use Unicode::GCString;
use Unicode::Normalize qw(NFD);
my $gcstring = Unicode::GCString->new( NFD('résumé') );
say "$gcstring → ", $gcstring->substr( 0, 4 );
This gets the right result:
$ perl -C gcsubstr.pl
résumé → résu
However, all of this gets around that the string is more than a collection of characters. Those characters have special meaning, so you can use that special meaning to do the right thing without thinking about string operations. The DateTime::Format::Strptime is a nice way to parse arbitrary date formats if you can describe the format:
use v5.10.1;
use utf8;
use strict;
use warnings;
use DateTime::Format::Strptime;
my $Strp = DateTime::Format::Strptime->new(
pattern => '%Y%m%d%H%M%S',
);
my $Strf = DateTime::Format::Strptime->new(
pattern => '%Y',
);
my $dt = $Strp->parse_datetime('20110512102331');
my $year = $Strf->format_datetime($dt);
say "year is $year";
You might also want to see How can I parse dates and convert time zones in Perl?.
No matter with way you decide to do it, you can hide the implementation details in a subroutine so you can change it without disrupting the rest of the program.
Easiest: use the substr function:
my $firstfour = substr($t,0,4);
Another way would be using a regexp:
my $firstfour = ($t =~ /(.{0,4}).*/s ? $1 : $t);
or, shorter, by calling the regexp in list context:
my ($firstfour) = $t =~ /(.{0,4})/s;
$t='20110512102331';
print substr($t, 0, 4);
See perldoc -f substr for more information.