-8

If I have

$t = '20110512102331';

and would like only the first 4 characters from $t.

How do I do that?

Sandra Schlichting
  • 25,050
  • 33
  • 110
  • 162
  • 13
    You ask a *lot* of Perl questions, often very basic stuff. Have you not yet learned to search the documentation yourself by now? Why not? – Konerak Mar 16 '12 at 10:12
  • 2
    Based on your history, I think my book _Learning Perl_ could help. :) – brian d foy Mar 16 '12 at 21:31

4 Answers4

6

By using the substr function like this -

my $t = "20110512102331";
my $four = substr($t, 0, 4)
MD Sayem Ahmed
  • 28,628
  • 27
  • 111
  • 178
3

For your particular problem, extracting what appears to be the year from a string, the substr accidentally works but is really the wrong answer here. It's idea of "character" is not our idea of "character". Notice how the different normalized forms of résumé produces different results. You probably want the first four graphemes, which you can match with \X (although in ASCII a grapheme and character give the same result)

use v5.10.1;
use utf8;
use strict;
use warnings;

use Unicode::Normalize qw(NFD NFC);

my $string = '20110512102331';
say "$string → ", substr $string, 0, 4;

my $ustring = NFD( 'résumé' );
say "NFD $ustring → ", substr $ustring, 0, 4;

$ustring = NFC( 'résumé' );
say "NFC $ustring → ", substr $ustring, 0, 4;

$ustring = NFD( 'résumé' );
say "\\X with NFD $ustring → ", $ustring =~ m/(\X{4})/;

$ustring = NFC( 'résumé' );
say "\\X with NFC $ustring → ", $ustring =~ m/(\X{4})/;

Notice the NFD result is different:

$ perl -C substr.pl
20110512102331 → 2011
NFD résumé → rés
NFC résumé → résu
\X with NFD résumé → résu
\X with NFC résumé → résu

However, substr does have some Unicode kung fu if you give it a string from Unicode::GCString:

use v5.10.1;
use utf8;
use strict;
use warnings;

use Unicode::GCString;
use Unicode::Normalize qw(NFD);

my $gcstring = Unicode::GCString->new( NFD('résumé') );
say "$gcstring → ", $gcstring->substr( 0, 4 );

This gets the right result:

$ perl -C gcsubstr.pl
résumé → résu

However, all of this gets around that the string is more than a collection of characters. Those characters have special meaning, so you can use that special meaning to do the right thing without thinking about string operations. The DateTime::Format::Strptime is a nice way to parse arbitrary date formats if you can describe the format:

use v5.10.1;
use utf8;
use strict;
use warnings;

use DateTime::Format::Strptime;

my $Strp = DateTime::Format::Strptime->new(
    pattern => '%Y%m%d%H%M%S',
    );
my $Strf = DateTime::Format::Strptime->new(
    pattern => '%Y',
    );

my $dt = $Strp->parse_datetime('20110512102331');

my $year = $Strf->format_datetime($dt);

say "year is $year";

You might also want to see How can I parse dates and convert time zones in Perl?.

No matter with way you decide to do it, you can hide the implementation details in a subroutine so you can change it without disrupting the rest of the program.

Community
  • 1
  • 1
brian d foy
  • 129,424
  • 31
  • 207
  • 592
2

Easiest: use the substr function:

my $firstfour = substr($t,0,4); 

Another way would be using a regexp:

my $firstfour = ($t =~ /(.{0,4}).*/s ? $1 : $t);

or, shorter, by calling the regexp in list context:

my ($firstfour) = $t =~ /(.{0,4})/s;
Konerak
  • 39,272
  • 12
  • 98
  • 118
  • 2
    `m/(.{4})/ms` is a better fit for "the first four characters of a string". Of course there is the corner case of strings less than length 4. The where the standard assumption is to return as many characters as the string has. So we could do that with this case: `m/(.{1,4})/ms`. – Axeman Mar 16 '12 at 12:10
  • 2
    @Axeman: that still doesn't handle the case of an empty string, which would leave `$firstfour` undefined. Also the `/m` qualifier is irrelevant here. `/(.{0,4})/s` is correct. – Borodin Mar 16 '12 at 13:16
  • Thanks Axeman, good points! @Borodin: it depends what should happen when a string is empty, but your "just return an empty string" makes sense. I'll update! – Konerak Mar 16 '12 at 13:50
2
$t='20110512102331'; 
print substr($t, 0, 4);

See perldoc -f substr for more information.

flesk
  • 7,439
  • 4
  • 24
  • 33
Sebastian Stumpf
  • 2,761
  • 1
  • 26
  • 34