23

I've been writing Perl for several years now and it is my preferred language for text processing (many of the genetics/genomics problems I work on are easily reduced to text processing problems). Perl as a language can be very forgiving, and it's possible to write very poor, but functional, code in Perl. Just the other day, my friend said he calls Perl a write-only language: write it once, understand it once, and never ever try to go back and fix it after it's finished.

While I have definitely been guilty of writing bad scripts at times, I feel like I have also written some very clear and maintainable code in Perl. However, if someone asked me what makes the code clear and maintainable, I wouldn't be able to give a confident answer.

What makes Perl code maintainable? Or maybe a better question is what makes Perl code hard to maintain? Let's assume I'm not the only one that will be maintaining the code, and that the other contributors, like me, are not professional Perl programmers but scientists with programming experience.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Daniel Standage
  • 8,136
  • 19
  • 69
  • 116
  • As a related subquestion: to what degree should you use/rely on $_ ? It seems obvious to me that while looping over an array then regexing the item that `foreach (@array) { if (/.../) { ...` is clear, but then again it is very Perl-only, if not write-only. – Joel Berger Dec 03 '10 at 22:05
  • (rather than `foreach my $item (@array) { if ($item =~ /.../) { ...` ) – Joel Berger Dec 03 '10 at 22:05
  • 1
    Perhaps I should add, I have started to adopt a policy that as long as I don't find myself manipulating $_ often, but simply using it as the default getting passed around, then I feel comfortable. Once I need to start doing a lot of things to $_ then it should have been named. Does this sound like a sane starting point? – Joel Berger Dec 03 '10 at 22:11
  • Bit late now, but this whole question would sit a lot better on programmers. http://programmers.stackexchange.com/ – Orbling Dec 04 '10 at 02:23
  • 1
    @Joel: Your expression if perfectly idiomatic and reasonable. If you have to use `$_`'s name a lot, then yes, it should be named. Otherwise, probably not. – tchrist Dec 04 '10 at 13:32

8 Answers8

33

What makes Perl code unmaintainable? Pretty much anything that makes any other program unmaintainable. Assuming anything other than a short script intended to carry out a well defined task, these are:

  • Global variables
  • Lack of separation of concerns: Monolithic scripts
  • NOT using self-documenting identifiers (variable names and method names). E.g. you should know what a variable's purpose is from its name. $c bad. $count better. $token_count good.
    • Spell identifiers out. Program size is no longer of paramount concern.
    • A subroutine or method called doWork doesn't say anything
    • Make it easy to find the source of symbols from another package. Either use explicit package prefix, or explicitly import every symbol used via use MyModule qw(list of imports).
  • Perl-specific:
    • Over-reliance on short-cuts and obscure builtin variables
    • Abuse of subroutine prototypes
    • not using strict and not using warnings
  • Reinventing the wheel rather than using established libraries
  • Not using a consistent indentation style
  • Not using horizontal and vertical white space to guide the reader

etc etc etc.

Basically, if you think Perl is -f>@+?*<.-&'_:$#/%!, and you aspire to write stuff like that in production code, then, yeah, you'll have problems.

People tend to confuse stuff Perl programmers do for fun (e.g., JAPHs, golf etc) with what good Perl programs are supposed to look like.

I am still unclear on how they are able to separate in their minds code written for IOCCC from maintainable C.

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
  • 3
    +1 for "over-reliance on short-cuts and obscure builtin variables". – JSBձոգչ Dec 03 '10 at 23:54
  • 1
    +1 Avoiding built-in variables is another way of saying, "Use named variables", which leads to another key to maintainability -- specifically, using good, self-documenting names for variables and methods. Programming is a form a writing, and word choice is critical. – FMc Dec 04 '10 at 01:04
  • 1
    Fully agree with FM re: Perl-agnostic concept of "self-documenting identifiers". Just because Perl itself has `$|`, it is not an excuse to create `$cnt` or better yet `$c` instead of `$count` or in some cases even better semantically precise `$character_count` – DVK Dec 04 '10 at 02:57
  • 1
    Another idea, though possibly controversial one, is don't go overboard on using imported symbols instead of writing out full package paths. While making the code longer and thus at times marginally less readable, using full package paths helps tremendously in terms of "where the heck is this identifier defined?" when using scores of modules as well as "I didn't realize it was "sorta()" function from My::Sort::Apples module and not a Perl built-in "sort()" – DVK Dec 04 '10 at 03:03
  • 1
    @DVK: You wrote `$|` but I think you meant `$.`. – tchrist Dec 04 '10 at 13:24
  • @tchrist - $| was a random example of a `perlvar` (well, semi-random because I was thinking of using it as an example for "add frigging comments when using perlvars" 5 minutes prior), but you are correct that `$.` which is a line count would have been a better example considering the sample variable names used later. – DVK Dec 04 '10 at 17:51
  • @Sinan - would you be OK with yourself or me incorporating some of the comments into your answer? – DVK Dec 04 '10 at 17:52
  • 2
    @DVK I find short variable names perfectly acceptable if their context makes their purpose obvious. – converter42 Dec 04 '10 at 17:52
  • @DVK: I never mind people going ahead and editing my answers to improve them. – Sinan Ünür Dec 04 '10 at 23:00
  • @DVK I tend to suggest to people to explicitly list all the symbols they want to import in the use statement. That makes it also easy to track the origin. So I'd write use Cwd qw(abs_path); – szabgab Dec 05 '10 at 08:21
  • 1
    while `use strict` itself is Perl specific, it (or at least `strict 'vars'`) embodies something that applies to other languages as well -- for example `Option Explicit` in VB or `implicit none` in Fortran. – hobbs Dec 05 '10 at 18:59
15

I suggest:

  1. Don't get too clever with the Perl. If you start playing golf with the code, it's going to result in harder-to-read code. The code you write needs to be readable and clear more than it needs to be clever.
  2. Document the code. If it's a module, add POD describing typical usage and methods. If it's a program, add POD to describe command line options and typical usage. If there's a hairy algorithm, document it and provide references (URLs) if possible.
  3. Use the /.../x form of regular expressions, and document them. Not everyone understands regexes well.
  4. Know what coupling is, and the pros/cons of high/low coupling.
  5. Know what cohesion is, and the pros/cons of high/low cohesion.
  6. Use modules appropriately. A nice well-defined, well-contained concept makes a great module. Reuse of such modules is the goal. Don't use modules simply to reduce the size of a monolithic program.
  7. Write unit tests for you code. A good test suite will not only allow you to prove your code is working today, but tomorrow as well. It will also let you make bolder changes in the future, with confidence that you are not breaking older applications. If you do break things, then, well, your tests suite wasn't broad enough.

But overall, the fact that you care enough about maintainability to ask a question about it, tells me that you're already in a good place and thinking the right way.

Paul Beckingham
  • 14,495
  • 5
  • 33
  • 67
13

I don't use all of Perl Best Practices, but that's the thing that Damian wrote it for. Whether or not I use all the suggestions, they are all worth at least considering.

Axeman
  • 29,660
  • 2
  • 47
  • 102
9

What makes Perl code maintainable?

At the least:

use strict;
use warnings;

See perldoc perlstyle for some general guidelines that will make your programs easier to read, understand, and maintain.

Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
  • This is only step zero of a ten step procedure, imo. Implementing as much OO design as possible will get you much further toward maintainability than a procedural approach - whether or not strict is on. (which of course I do always use, tho) – zanlok Dec 05 '10 at 15:42
5

One factor very important to code readability that I haven't seen mentioned in other answers is the importance of white space, which is both Perl-agnostic and in some ways Perl-specific.

Perl lets you write VERY concise code, but consise chunks don't mean they have to be all bunched together.

White space has lots of meaning/uses when we are talking about readability, not all of them widely used but most useful:

  • Spaces around tokens to easier separate them visually.

    This space is doubly important in Perl due to prevalence of line noise characters even in best-style Perl code.

    I find $myHashRef->{$keys1[$i]}{$keys3{$k}} to be less readable at 2am in the middle of producion emergency compared to spaced out: $myHashRef->{ $keys1[$i] }->{ $keys3{$k} }.

    As a side note, if you find your code doing a lot of deep nested reference expressions all starting with the same root, you should absolutely consider assigning that root into a temporary pointer (see Sinan's comment/answer).

    A partial but VERY important special case of this is of course regular expressions. The difference was illustrated to death in all the main materials I recall (PBP, RegEx O'Reilly book, etc..) so I won't lengthen this post even further unless someone requests examples in the comments.

  • Correct and uniform indentation. D'oh. Obviously. Yet I see way too much code 100% unreadable due to crappy indentation, and even less readable when half of the code was indented with TABs by a person whose editor used 4 character tabs and another by a person whose editor used 8 character TABs. Just set your bloody editor to do soft (e.g. space-emulated) TABs and don't make others miserable.

  • Empty lines around logically separate units of code (both blocks and just sets of lines). You can write a 10000 line Java program in 1000 lines of good Perl. Now don't feel like Benedict Arnold if you add 100-200 empty lines to those 1000 to make things more readable.

  • Splitting uber-long expressions into multiple lines, closely followed by...

  • Correct vertical alignment. Witness the difference between:

    if ($some_variable > 11 && ($some_other_bigexpression < $another_variable || $my_flag eq "Y") && $this_is_too_bloody_wide == 1 && $ace > my_func() && $another_answer == 42 && $pi == 3) {
    

    and

    if ($some_variable > 11 && ($some_other_bigexpression < $another_variable || 
        $my_flag eq "Y") && $this_is_too_bloody_wide == 1 && $ace > my_func()
        && $another_answer == 42 && $pi == 3) {
    

    and

    if (   $some_variable > 11
        && ($some_other_bigexpression < $another_variable || $my_flag eq "Y")
        && $this_is_too_bloody_wide == 1
        && $ace > my_func()
        && $another_answer == 42
        && $pi == 3) {
    

    Personally, I prefer to fix the vertical alignment one more step by aligning LHS and RHS (this is especially readable in case of long SQL queries but also in Perl code itself, both the long conditionals like this one as well as many lines of assignments and hash/array initializations):

    if (   $some_variable               >  11
        && ($some_other_bigexpression   <  $another_variable || $my_flag eq "Y")
        && $this_is_too_bloody_wide    ==  1
        && $ace                         >  my_func()
        && $another_answer             ==  42
        && $pi                         ==  3  ) {
    

    As a side note, in some cases the code could be made even more readable/maintainable by not having such long expressions in the first place. E.g. if the contents of the if(){} block is a return, then doing multiple if/unless statements each of which has a return block may be better.

DVK
  • 126,886
  • 32
  • 213
  • 327
  • 1
    +1 Except that you might want to avoid writing an if statement with that many conditions in the first place. For example, early return from the subroutine should help you with that. I am happy with multiple returns in a subroutine. – Sinan Ünür Dec 04 '10 at 10:24
  • Also, wrt to deep hash dereferencing, yes, spaces help but so does NOT doing multi-level hash dereferencing more than once by assigning the deepest level you are going to dereference to a temporary variable. I know you know all of this but it might be useful to the reader. Give me a heads up if you choose to incorporate these comments so I can clean up. – Sinan Ünür Dec 04 '10 at 10:26
  • And, really, this should have been +`0xC0DE` for whitespace! – Sinan Ünür Dec 04 '10 at 10:33
  • 1
    Multiline vertical alignment is a little-discussed factor in helping make code more maintainable. It allows the similaries to fade in the background so the differences stand out. – tchrist Dec 04 '10 at 13:29
  • @tchrist - part of the reason I find it so obviously useful is the fact that my editor of choice (UltraEdit) allows rectangle-select and column-editing. So the benefit is in both readability and actual code changes. – DVK Dec 04 '10 at 18:00
  • I couldn't agree more on the readability aspect... `perltidy` is a great asset in this regard. – Zaid Dec 05 '10 at 19:46
4

i see this as an issue of people being told that perl is unreadable, and they start to make assumptions about the maintability of their own code. if you are conscientious enough to consider readability as a hallmark of quality code, chances are this critique doesn't apply to you.

most people will cite regexes when they discuss readability. regexes are a dsl embedded in perl and you can either read them or not. if someone can't take the time to understand something so basic and essential to many languages, i'm not concerned about trying to bridge some inferred cognitive gap...they should just man up, read the perldocs, and ask questions where necessary.

others will cite perl's use of short-form vars such as @_, $! etc. these are all easily disambiguated...i'm not interested in making perl look like java.

the upside of all of these quirks and perlisms is that codebases written in the language are often terse and compact. i'd rather read ten lines of perl than one hundred lines of java.

to me there is so much more to "maintainability" than simply having easy-to-read code. write tests, make assertions...do everything else you can do to lean on perl and its ecosystem to keep code correct.

in short: write programs to be first correct, then secure, then well-performing....once these goals have been met, then worry about making it nice to curl up with near a fire.

Brad Clawsie
  • 1,080
  • 6
  • 12
  • 4
    @Brad - do you have any experience with developing perl in enterprise setting? (which means (1) Collaboration from MANY people to the same codebase; (2) Code gets maintained a LOT; (3) The costs of breaking stuff when maintining code are VERY high) – DVK Dec 03 '10 at 21:50
  • 3
    @Brad - in such a setting, i'd EASILY take a somewhat corect, insecure, poorly performing code that **I can easily read and therefore maintain** without investing half my life into grokking it, because then I can fix all of the above in well-written code at small cost but the cost of fixing poorly written code is often pretty much equivalent to full rewrite – DVK Dec 03 '10 at 21:52
  • Even if you know regexes (and I love them don't get me wrong), documenting in your code what you are using them for and what you expect them to find and store and how it does that can be of great use later. – Joel Berger Dec 03 '10 at 22:02
  • 1
    you're all talking like i am telling you to throw code quality to the wind. i'm not. i'm saying to use perl as perl. good perl will always read differently than other languages, but there is an expectation that people reading it know the "perl way". come on, were this lisp or even C, the answer would be the same - you can't read what you can't understand, don't blame the author. – Brad Clawsie Dec 03 '10 at 22:09
  • 1
    @brad - what we are talking about is that following your advice and writing "first correct, then secure, then well-performing" code which is NOT readable is a bad idea since making that code readable later on is a VERY expensive proposition which in real world will never get allocated resources to. Which effectively throws code quality to the wind, whether your wording intended to imply that or not. – DVK Dec 04 '10 at 02:07
  • 1
    @brad - also, upon re-reading your question, sorry, but you're making strawman arguments here. VERY few people with a clue consider regexes or the SPECIFIC variables you cite (@_, $!) as the main reason Perl code is unreadable, although commented regexes are a marked benefit in code readability over uncommented ones. There's a major difference between concise well written Perl code using full Perl capabilities and golf code using bad style, and this question was clearly about the latter whereas you offer a spirited defense of the former.... – DVK Dec 04 '10 at 02:10
  • I think I'd rather read 100 lines of dense Perl than 10 lines of Java. Java is the write-once-have-to-retire-with-repetitive-strain-injury language. – Orbling Dec 04 '10 at 02:15
  • @brad - ...Most of what you say is true by the way. (except the priority/importance of code readability that I took exception to in my original comment). Just to be clear. – DVK Dec 04 '10 at 02:16
  • @Orbling - People tell me that Eclipse is supposed to solve part of RSI-injury problem for Java, but I was spared the painful necessity of testing that hypothesis so far so can't vouch for its accuracy. – DVK Dec 04 '10 at 02:18
  • 1
    @DVK On a similar conversation on programmers this week people were defending Java as being easy with suitable CASE tools that write 95% of your code for you. My reaction to that is that implies about 95% of your code is redundant and the language is terribly verbose. (Incidentally, this whole question should be on programmers.) – Orbling Dec 04 '10 at 02:22
2

I would say the packaging/object models, that gets reflected in the directory structure for .pm files. For my PhD I wrote quite a lot of Perl code that I reuse afterwards. It was for automatic LaTeX diagram generator.

Diego Sevilla
  • 28,636
  • 4
  • 59
  • 87
1

I'll talk some positive things to make Perl maintainable.

It's true that you usually shouldn't get too clever with really dense statements a la return !$@;#% and the like, but a good amount of clever using list-processing operators, like map and grep and list-context returns from the likes of split and similar operators, in order to write code in a functional style can make a positive contribution to maintainability. At my last employer we also had some snazzy hash-manipulation functions that worked in a similar way (hashmap and hashgrep, though technically we only fed them even-sized lists). For instance:

# Look for all the servers, and return them in a pipe-separated string
# (because we want this for some lame reason or another)
return join '|', 
       sort
       hashmap {$a =~ /^server_/ ? $b : +()} 
       %configuration_hash;

See also Higher Order Perl, http://hop.perl.plover.com - good use of metaprogramming can make defining tasks more coherent and readable, if you can keep the metaprogramming itself from getting in the way.