Like many an unfortunate programmer soul before me, I am currently dealing with an archaic file format that refuses to die. I'm talking ~1970 format specification archaic. If it were solely up to me, we would throw out both the file format and any tool that ever knew how to handle it, and start from scratch. I can dream, but that unfortunately that won't resolve my issue.
The format: Pretty Loosely defined, as years of nonsensical revisions have destroyed almost all back compatibility it once had. Basically, the only constant is that there are section headings, with few rules about what comes before or after these lines. The headings are sequential (e.g. HEADING1, HEADING2, HEADING3,...), but not numbered and are not required (e.g HEADING1, HEADING3, HEADING7). Thankfully, all possible heading permutations are known. Here's a fake example:
# Bunch of comments
SHOES # First heading
# bunch text and numbers here
HATS # Second heading
# bunch of text here
SUNGLASSES # Third heading
...
My problem: I need to concatenate multiple of these files by these section headings. I have a perl script that does this quite nicely:
while(my $l=<>) {
if($l=~/^SHOES/i) { $r=\$shoes; name($r);}
elsif($l=~/^HATS/i) { $r=\$hats; name($r);}
elsif($l=~/^SUNGLASSES/i) { $r=\$sung; name($r);}
elsif($l=~/^DRESS/i || $l=~/^SKIRT/i ) { $r=\$dress; name($r);}
...
...
elsif($l=~/^END/i) { $r=\$end; name($r);}
else {
$$r .= $l;
}
print STDERR "Finished processing $ARGV\n" if eof;
}
As you can see, with the perl script I basically just change where a reference points to when I get to a certain pattern match, and concatenate each line of the file to its respective string until I get to the next pattern match. These are then printed out later as one big concated file.
I would and could stick with perl, but my needs are becoming more complex every day and I would really like to see how this problem can be solved elegantly with python (can it?). As of right now my method in python is basically to load the entire file as a string, search for the heading locations, then split up the string based on the heading indices and concat the strings. This requires a lot of regex, if-statements and variables for something that seems so simple in another language.
It seems that this really boils down to a fundamental language issue. I found a very nice SO discussion about python's "call-by-object" style as compared with that of other languages that are call-by-reference. How do I pass a variable by reference? Yet, I still can't think of an elegant way to do this in python. If anyone can help kick my brain in the right direction, it would be greatly appreciated.