3

I want to write a perl program which reads the file and extracts the dates in it. However if a date passes more than one times I will print it only once. For example:

On 01/10/2011 I went home. On 02/02/2012, I
went to my school. On 02/02/2012, I went
to London.

The output should be:

01/10/2011
02/02/2012

I can do it by adding the dates to an array and control it in every time I read an new date. But I am asking for a more efficient way. Is there a logical way to do it? or any data structure in perl?

Birei
  • 35,723
  • 2
  • 77
  • 82
user2870
  • 477
  • 7
  • 18

2 Answers2

2

It will scan line by line looking for dates in \d\d/\d\d/\d{4} format and save them in hash as keys.

When file reading is done, it prints these unique keys.

perl -nE '$s{$_}++ for m| (\d\d/\d\d/\d{4}) |xg;}{say for sort keys %s' file

It can be translated to more readable form (plus some checks)

use strict;
open my $fh, "<", "file" or die $!;

my %s;
while (my $line = <$fh>) {

  my @dates = $line =~ m| (\d\d/\d\d/\d{4}) |xg;

  for my $date (@dates) {
    $s{$date} += 1;
  }
}

for my $date (sort keys %s) {

  print $date, "\n";
}
mpapec
  • 50,217
  • 8
  • 67
  • 127
  • 2
    To explain what the above answer does: searches through the text using a regex to find all matches in form XX/XX/XXXX and increments that key in a dictionary (implicitly creating it if it doesn't exist). Then it just prints out the keys from the dictionary. This is basically the same as your suggestion. – Sysyphus May 31 '13 at 14:53
  • 2
    Minor terminology correction: In Perl, the data type used in this answer is called a "hash", not a "dictionary". – Dave Sherohman May 31 '13 at 15:02
  • +1 for explanation and serious tone. My answer used the "nearly James Bond" switch for slurping which I think is not safe for Unicode? It would be nice if most of ```List::MoreUtils``` was in ```List::Utils``` (i.e. CORE) – G. Cito May 31 '13 at 20:33
  • you can slurp any binary with `-0777` so unicode should not make any trouble – mpapec May 31 '13 at 20:42
  • How can you filter instead of date create a sequence of all possible 4 letter combinations of each word. – pcproff Nov 20 '14 at 03:59
  • @pcproff please post a new question, and explain how input and output data should look like. – mpapec Nov 20 '14 at 06:22
0

If you are open to installing a module to do this (I know it seems like overkill) List::MoreUtils has a uniq method. Everyone avert your eyes ... it's Friday afternoon, very hot and possibly time to slurp(-0777) beer:

perl -'MList::MoreUtils qw(uniq)' -0777nE '@dates = m|(\d\d/\d\d/\d{4})|xg ; @x = uniq(@dates); say "@x" ' file.txt

Sorry ;-)

G. Cito
  • 6,210
  • 3
  • 29
  • 42
  • hm, here it looks for uniq and prints on every line, but it should do these when reading is complete – mpapec May 31 '13 at 20:45
  • Odd, I'm using 5.18. It might be interesting to see if different versions (of ```perl``` and ```List::MoreUtils```) behave differently - ```perlbrew exec``` to the rescue! Anyway your solution is moduleless and uses the butterfly ```}{``` so is the best be default ;-) Cheers! – G. Cito Jun 02 '13 at 17:04