How do Perl modules "work"?

Question

I am confused about Perl modules. I get that a module can be used to dump a whole bunch of subs into, tidying main code.

But, what is the relationship between modules?

Can modules "use" other modules?

Must I use export, or can I get away with dropping that stuff?

How do I solve circular use? (Security.pm uses Html.pm and Html.pm uses Security.pm). I know the obvious answer, but in some cases I need to use Security.pm routines in Html.pm and vice versa - not sure how to get around the problem.

If I remove all "use" clauses from all of my modules ... Then I have to use full sub qualifiers. For example, Pm::Html::get_user_friends($dbh, $uid) will use Security to determine if a friend is a banned user or not (banned is part of Security).

I just don't get this module thing. All the "tutorials" only speak of one module, never multiple, nor do they use real world examples.

The only time I've come across multiple modules is when using OO code. But nothing that tells me definitively one way or another how multiple modules interact.

This is not the kind of question that is appropriate for Stackoverflow (lacking a clear specific problem with a clear and specific answer/solution), but it has elicited a couple of very high quality answers that provide a starting point. However, this is not the kind of material one can learn well from these sorts of answers. One would be best served by [getting a book](http://shop.oreilly.com/product/0636920012689.do) and studying the material while asking specific questions about specific problems. See also [`perldoc perlmod`](https://metacpan.org/pod/perlmod) which is installed with Perl. — Sinan Ünür, Jun 20 '17 at 15:57
@SinanÜnür and now they deleted their account. Meh. If SO documentation didn't suck so much I'd say we move the answers there, but well... — simbabque, Jun 21 '17 at 07:52

simbabque · Answer 1 · 2017-06-20T09:31:52.183

Modules in Perl come in several flavors and have several different things that make them a module.

Definition

Something qualifies as a module if the following things are true:

the filename ends in .pm,
there is a package declaration in the file,
the package name matches the filename and path; Data/Dumper.pm contains package Data::Dumper,
it ends with a 1 or another true value.

Conventions

Then there are some accepted conventions:

modules should usually only contain one package,
module names should be camel case and should not contain underscores _ (example: Data::Dumper, WWW::Mechanize::Firefox)
modules that are in small letters completely are not modules, they are pragmas.

Usually a module either contains a collection of functions (subs) or it is object oriented. Let's look at the collections first.

Modules as function collections

A typical module that bundles a bunch of functionality that is related uses a way to export those functions into your code's namespace. A typical example is List::Util. There are several ways to export things. The most common one is Exporter.

When you take a function from a module to put it into your code, that's called importing it. That is useful if you want to use the function a lot of times, as it keeps the name short. When you import it, you can call it directly by its name.

use List::Util 'max';
print max(1, 2, 3);

When you don't import it, you need to use the fully qualified name.

use List::Util (); # there's an empty list to say you don't want to import anything
print List::Util::max(1, 2, 3); # now it's explicit

This works because Perl installs a reference to the function behind List::Util::max into your namespace under the name max. If you don't do that, you need to use the full name. It's a bit like a shortcut on your desktop in Windows.

Your module does not have to provide exporting/importing. You can just use it as a collection of stuff and call them by their full names.

Modules as a collection of packages

While every .pm file called a module, people often also refer to a whole collection of things that are a distribution as a module. Something like DBI comes to mind, which contains a lot of .pm files, which are all modules, but still people talk about the DBI module only.

Object oriented modules

Not every module needs to contain stand-alone functions. A module (now we're more talking about the one directly above) can also contain a class. In that case it usually does not export any functions. In fact, we do not call the subs functions any more, but rather methods. The package name becomes the name of the class, you create instances of the class called objects, and you call methods on those objects, which ends up being the functions in your package.

Loading modules

There are two main ways of loading a module in Perl. You can do it at compile time and at run time. The perl¹ compiler (yes, there is a compiler although it's interpreted language) loads files, compiles them, then switches to run time to run the compiled code. When it encounters a new file to load, it switches back to compile time, compiles the new code, and so on.

Compile time

To load a module at compile time, you use it.

use Data::Dumper;
use List::Util qw( min max );
use JSON ();

This is equivalent to the following.

BEGIN {
  require Data::Dumper;
  Data::Dumper->import;

  require List::Util;
  List::Util->import('min', 'max');

  require JSON;
  # no import here
}

The BEGIN block gets called during compile time. The example in the linked doc helps understand the concept of those switches back and forth.

The use statements usually go at the top of you program. You do pragmas first (use strict and use warnings should always be your very first things after the shebang), then use statements. They should be used so your program loads everything it needs during startup. That way at run time, it will be faster. For things that run for a long time, or where startup time doesn't matter, like a web application that runs on Plack this is what you want.

Run time

When you want to load something during run time, you use require. It does not import anything for you. It also switches to compile time for the new file momentarily, but then goes back to run time where it left of. That makes it possible to load modules conditionally, which can be useful especially in a CGI context, where the additional time it takes to parse a new file during the run outweighs the cost of loading everything for every invocation of the program although it might not be needed.

require Data::Dumper;

if ($foo) {
    require List::Util;
    return List::Util::max( 1, 2, 3, $foo );
}

It is also possible to pass a string or a variable to require, so you can not only conditionally load things, but also dynamically.

my $format = 'CSV'; # or JSON or XML or whatever
require "My::Parser::$format";

This is pretty advanced, but there are use-cases for it.

In addition, it's also possible to require normal Perl files with a .pl ending at run time. This is often done in legacy code (which I would call spaghetti). Don't do it in new code. Also don't do it in old code. It's bad practice.

Where to load what

In general you should always use or require every module that you depend on in any given module. Never rely on the fact that some other downstream part of your code loads things for you. Modules are meant to encapsulate functionality, so they should be able to at least stand on their own a little bit. If you want to reuse one of your modules later, and you forgot to include a dependency it will cause you grief.

It also makes it easier to read your code, as clearly stated dependencies and imports at the top help the maintenance guy (or future you) to understand what your code is about, what it does and how it does it.

Not loading the same thing twice

Perl takes care of that for you. When it parses the code at compile time, it keeps track of what it has loaded. Those things to into the super-global variable %INC, which is a hash of names that have been loaded, and where they came from.

$ perl -e 'use Data::Dumper; print Dumper \%INC'
$VAR1 = {
          'Carp.pm' => '/home/foo/perl5/perlbrew/perls/perl-5.20.1/lib/site_perl/5.20.1/Carp.pm',
          'warnings.pm' => '/home/foo/perl5/perlbrew/perls/perl-5.20.1/lib/5.20.1/warnings.pm',
          'strict.pm' => '/home/foo/perl5/perlbrew/perls/perl-5.20.1/lib/5.20.1/strict.pm',
          'constant.pm' => '/home/foo/perl5/perlbrew/perls/perl-5.20.1/lib/site_perl/5.20.1/constant.pm',
          'XSLoader.pm' => '/home/foo/perl5/perlbrew/perls/perl-5.20.1/lib/site_perl/5.20.1/x86_64-linux/XSLoader.pm',
          'overloading.pm' => '/home/foo/perl5/perlbrew/perls/perl-5.20.1/lib/5.20.1/overloading.pm',
          'bytes.pm' => '/home/foo/perl5/perlbrew/perls/perl-5.20.1/lib/5.20.1/bytes.pm',
          'warnings/register.pm' => '/home/julien/perl5/perlbrew/perls/perl-5.20.1/lib/5.20.1/warnings/register.pm',
          'Exporter.pm' => '/home/foo/perl5/perlbrew/perls/perl-5.20.1/lib/site_perl/5.20.1/Exporter.pm',
          'Data/Dumper.pm' => '/home/foo/perl5/perlbrew/perls/perl-5.20.1/lib/5.20.1/x86_64-linux/Data/Dumper.pm',
          'overload.pm' => '/home/foo/perl5/perlbrew/perls/perl-5.20.1/lib/5.20.1/overload.pm'
        };

Every call to use and require adds a new entry in that hash, unless it's already there. In that case, Perl does not load it again. It still imports names for you if you used the module though. This makes sure that there are no circular dependencies.

Another important thing to keep in mind with regards to legacy code is that if you require normal .pl files, you need to get the path right. Because the key in %INC will not be the module name, but instead the string you passed, doing the following will result in the same file being loaded twice.

perl -MData::Dumper -e 'require "scratch.pl"; require "./scratch.pl"; print Dumper \%INC'
$VAR1 = {
          './scratch.pl' => './scratch.pl',
          'scratch.pl' => 'scratch.pl',
          # ...
        };

Where modules are loaded from

Just like %INC, there is also a super global variable @INC, which contains the paths that Perl looks for modules in. You can add stuff to it by using the lib pragma, or via the environment variable PERL5LIB among other things.

use lib `lib`;
use My::Module; # this is in lib/My/Module.pm

Namespaces

The packages you use in your modules define namespaces in Perl. By default when you create a Perl script without a package, you are in the package main.

#!/usr/bin/env perl
use strict;
use warnings;

sub foo { ... }

our $bar;

The sub foo will be available as foo inside the main .pl file, but also as main::foo from anywhere else. The shorthand is ::foo. The same goes for the package variable $bar. It's really $main::bar or just $::bar. Use this sparingly. You don't want stuff from your script to leak over in your modules. That's a very bad practice that will come back and bite you later.

In your modules, things are in the namespace of the package they are declared in. That way, you can access them from the outside (unless they are lexically scoped with my, which you should do for most things). That is mostly ok, but you should not be messing with internals of other code. Use the defined interface instead unless you want to break stuff.

When you import something into your namespace, all it is is a shortcut as described above. This can be useful, but you also do not want to pollute your namespaces. If you import a lot of things from one module to another module, those thing will become available in that module too.

package Foo;
use List::Util 'max';

sub foo { return max(1, 2, 3) }

package main; # this is how you switch back
use Foo;

print Foo::max(3, 4, 5); # this will work

Because you often do not want this to happen, you should chose carefully what you want to import into your namespace. On the other hand you might not care, which can be fine, too.

Making things private

Perl does not understand the concept of private or public. When you know how the namespaces work you can pretty much get to everything that is not lexical. There are even ways to get to lexicals to, but they involve some arcane black magic and I'll not go into them.

However, there is a convention on how to mark things as private. Whenever a function or variable starts with an underscore, it should be considered private. Modern tools like Data::Printer take that into account when displaying data.

package Foo;

# this is considered part of the public interface
sub foo { 
    _bar();
}

# this is considered private
sub _bar {
    ...
}

It's good practice to start doing things like that, and to keep away from the internals of modules on CPAN. Things that are named like that are not considered stable, they are not part of the API and they can change at any time.

Conclusion

This was a very broad overview of some of the concepts involved here. Most of it will quickly become second nature to you once you've used it a few times. I remember that it took me about a year during my training as a developer to wrap my head around that, especially exporting.

When you start a new module, the perldoc page perlnewmod is very helpful. You should read that and make sure you understand what it says.

^{1: notice the small p in perl? I'm talking about the program here, not the name of the language, which is Perl.}

score 6 · Answer 2 · answered Jun 20 '17 at 08:54

(Your question would be a lot easier to read if you used capital letters.)

can modules "use" other modules?

Yes. You can load a module within another module. If you had looked at almost any CPAN module code, you would have seen examples of this.

must i use export, or can i get away with dropping that stuff?

You can stop using Exporter.pm if you want. But if you want to export symbol names from your modules then you either use Exporter.pm or you implement your own export mechanism. Most people choose to go with Export.pm as it's easier. Or you could look at alternatives like Exporter::Lite and Exporter::Simple.

how do i solve circular use (security.pm uses html.pm and html.pm uses security.pm)

By repartitioning your libraries to get rid of these circular dependencies. It might mean that you're putting too much into one module. Perhaps make smaller, more specialised, modules. Without seeing more explicit examples, it's hard to be much help here.

if i remove all "use" clauses from all of my PM's...then i have to use full sub qualifiers. for example, pm::html::get_user_friends($dbh, $uid) will use security to determine if a friend is a banned user or not (banned is part of security)

You're misunderstanding things here.

Calling use does two things. Firstly, it loads the module and secondly, it runs the module's import() subroutine. And it's the import() subroutine that does all of the Exporter.pm magic. And it's the Exporter.pm magic that allows you to call subroutines from other modules using short names rather than fully-qualified names.

So, yes, if you remove use statements from a module, then you will probably lose the ability to use short names for subroutines from other modules. But you're also relying on some other code in your program to actually load the module. So if you remove all use statements that load a particular module, then you won't be able to call the subroutines from that module. Which seems counter-productive.

It's generally a very good idea for all code (whether it's your main calling program or a module) to explicitly load (with use) any modules that it needs. Perl keeps track of modules that have already been loaded, so there is no problem with inefficiency due to modules being loaded multiple times. If you want to load a module and turn off any exporting of symbol names, then you can do that using syntax like:

use Some::Module (); # turn off exports

The rest of your question just seems like a rant. I can't find any more questions to answer.

You wrote *`use Some::Module (); # turn off exports`*. Why are you using the module developer perspective for the comment instead of *turn off imports* — Wolf, Jan 09 '19 at 16:12
@Wolf: Well, I can't remember what I was thinking when I wrote that eighteen months ago. But I often use the two terms interchangeably. — Dave Cross, Jan 09 '19 at 16:41