3

What is the "proper" way to bundle a required-at-runtime data file with a Perl module, such that the module can read its contents before being used?

A simple example would be this Dictionary module, which needs to read a list of (word,definition) pairs at startup.

package Reference::Dictionary;

# TODO: This is the Dictionary, which needs to be populated from
#  data-file BEFORE calling Lookup!
our %Dictionary;

sub new {
  my $class = shift;
  return bless {}, $class;
}

sub Lookup {
  my ($self,$word) = @_;
  return $Dictionary{$word};
}
1;

and a driver program, Main.pl:

use Reference::Dictionary;

my $dictionary = new Reference::Dictionary;
print $dictionary->Lookup("aardvark");

Now, my directory structure looks like this:

root/
  Main.pl
  Reference/
    Dictionary.pm
    Dictionary.txt

I can't seem to get Dictionary.pm to load Dictionary.txt at startup. I've tried a few methods to get this to work, such as...

  • Using BEGIN block:

    BEGIN {
      open(FP, '<', 'Dictionary.txt') or die "Can't open: $!\n";
      while (<FP>) {
        chomp;
        my ($word, $def) = split(/,/);
        $Dictionary{$word} = $def;
      }
      close(FP);
    }
    

    No dice: Perl is looking in cwd for Dictionary.txt, which is the path of the main script ("Main.pl"), not the path of the module, so this gives File Not Found.

  • Using DATA:

    BEGIN {
      while (<DATA>) {
        chomp;
        my ($word, $def) = split(/,/);
        $Dictionary{$word} = $def;
      }
      close(DATA);
    }
    

    and at end of module

    __DATA__
    aardvark,an animal which is definitely not an anteater
    abacus,an oldschool calculator
    ...
    

    This too fails because BEGIN executes at compile-time, before DATA is available.

  • Hard-code the data in the module

    our %Dictionary = (
      aardvark => 'an animal which is definitely not an anteater',
      abacus => 'an oldschool calculator'
      ...
    );
    

    Works, but is decidedly non-maintainable.

Similar question here: How should I distribute data files with Perl modules? but that one deals with modules installed by CPAN, not modules relative to the current script as I'm attempting to do.

Community
  • 1
  • 1
Greg Kennedy
  • 430
  • 4
  • 23
  • Note that prototypes (ie. `sub new()`) have *no effect* on methods. They're not function signatures, [they're something completely different](https://stackoverflow.com/questions/297034/why-are-perl-5s-function-prototypes-bad). Don't use them unless you know what you're doing. If you want function signatures consider [Method::Signatures](https://metacpan.org/pod/Method::Signatures), [Kavorka](https://metacpan.org/pod/Kavorka) or [Function::Parameters](https://metacpan.org/pod/Function::Parameters). – Schwern Nov 05 '15 at 02:55
  • I suggest using `DATA` with `INIT` instead of `BEGIN` to ensure that the data is initialised before run time. It also makers it more self-documenting – Borodin Nov 05 '15 at 03:09
  • @Schwern oops my bad, too much C/C++ lately! I'll edit to remove these. – Greg Kennedy Nov 05 '15 at 04:48
  • You should also avoid using capital letters in method identifiers. They are reserved for Perl globals such as the package name, so your `sub Lookup` should be `sub lookup` – Borodin Nov 05 '15 at 04:52
  • @Borodin Upper case method names aren't reserved. It's not normal Perl style, but they're not reserved. – Schwern Nov 05 '15 at 05:09

2 Answers2

6

There's no need to load the dictionary at BEGIN time. BEGIN time is relative to the file being loaded. When your main.pl says use Dictionary, all the code in Dictionary.pm is compiled and loaded. Put the code to load it early in Dictionary.pm.

package Dictionary;

use strict;
use warnings;

my %Dictionary;  # There is no need for a global
while (<DATA>) {
    chomp;
    my ($word, $def) = split(/,/);
    $Dictionary{$word} = $def;
}

You can also load from Dictionary.txt located in the same directory. The trick is you have to provide an absolute path to the file. You can get this from __FILE__ which is the path to the current file (ie. Dictionary.pm).

use File::Basename;

# Get the directory Dictionary.pm is located in.
my $dir = dirname(__FILE__);

open(my $fh, '<', "$dir/Dictionary.txt") or die "Can't open: $!\n";

my %Dictionary;
while (<$fh>) {
    chomp;
    my ($word, $def) = split(/,/);
    $Dictionary{$word} = $def;
}
close($fh);

Which should you use? DATA is easier to distribute. A separate, parallel file is easier for non-coders to work on.


Better than loading the whole dictionary when the library is loaded, it is more polite to wait to load it when it's needed.

use File::Basename;

# Load the dictionary from Dictionary.txt
sub _load_dictionary {
    my %dictionary;

    # Get the directory Dictionary.pm is located in.
    my $dir = dirname(__FILE__);

    open(my $fh, '<', "$dir/Dictionary.txt") or die "Can't open: $!\n";

    while (<$fh>) {
        chomp;
        my ($word, $def) = split(/,/);
        $dictionary{$word} = $def;
    }

    return \%dictionary;
}

# Get the possibly cached dictionary
my $Dictionary;
sub _get_dictionary {
    return $Dictionary ||= _load_dictionary;
}

sub new {
    my $class = shift;

    my $self = bless {}, $class;
    $self->{dictionary} = $self->_get_dictionary;

    return $self;
}

sub lookup {
    my $self = shift;
    my $word = shift;

    return $self->{dictionary}{$word};
}

Each object now contains a reference to the shared dictionary (eliminating the need for a global) which is only loaded when an object is created.

Schwern
  • 153,029
  • 25
  • 195
  • 336
0

I suggest using DATA with INIT instead of BEGIN to ensure that the data is initialised before run time. It also makers it more self-documenting

Or it may be more appropriate to use a UNITCHECK block, which will be executed as early as possible, immediately after the library file is compiled, and so can be considered as an extension of the compilation

package Dictionary;

use strict;
use warnings;

my %dictionary;
UNITCHECK {
    while ( <DATA> ) {
        chomp;
        my ($k, $v) = split /,/;
        $dictionary{$k} = $v;
    }
}
Borodin
  • 126,100
  • 9
  • 70
  • 144