0

I'm writing code to divide up (big) text strings "from the back" in perl (i.e. end-to-start or right-to-left) into equal-size chunks (with remainder at the front).

It's working, but this seems to be a case where perl's "it's easy/fast to do (conceptually) easy things" paradigm is breaking down.

The most elegant way I found is adapted from here: How do I display large numbers with commas? HTML

my @a = split /(?=(?:.{8})+$)/,$a;

But this is very slow as the strings get very large, probably due to all the necessary backtracking. Is there maybe a more efficient way using the same idea (or any regexp)?

I rejected the idea of "reverse input, process forward, reverse output" out-of-hand out of similar inefficiency concerns. But I'd welcome correction on those concerns if anyone knows anything about that.

I did do a brute-force "iteration of substr's" implementation, which was fine but inelegant.

Only slightly less elegant but also slightly faster is an implementation using unpack I've currently got running, as adapted from here: Split a String into Equal Length Chunk in Perl

use integer;
my $la = length($a);
my $r = $la % 8;
my @a = unpack(($r?"a$r":"")."(a8)"x($la/8), $a);

Pretty ugly. Even the seeming simplification "(a8)*" (instead of the x) fails because for some reason perl gives an extra "" at the end in cases where length is less than 8, say 5, and the unpack template is "a5(a8)*". (Anybody have an explanation for that "feature"? :-S)

Any better ideas for simplification without introducing inefficiency? Thanks.

Community
  • 1
  • 1
Jeff Y
  • 2,437
  • 1
  • 11
  • 18
  • In case anyone is interested, the application is bignum arithmetic, using the chunking to do "native" arithmetic where possible. – Jeff Y Nov 15 '15 at 16:22
  • in that case, have you considered storing your numbers backwards? :) – hobbs Nov 15 '15 at 16:30
  • @hobbs Yes, that was my latest thought -- just represent everything backwards internally. Until I realized that I'd have to re-reverse on every operation's arguments (inner loop) to feed them to "native" arithmetic. Sort of defeating the purpose. – Jeff Y Nov 15 '15 at 16:34
  • I found a slight improvement: the parens around `(a8)` are superfluous. (Parens were left over from `(a8)*`). – Jeff Y Nov 15 '15 at 18:25
  • Alternative improvement: the parens can be left in and `x` changed to `.` instead, for a space improvement. – Jeff Y Nov 15 '15 at 18:42
  • 1
    @JeffY, since you care for performance, and task is solvable with the `unpack`... IME there is nothing faster than `pack`/`unpack` at processing raw data in Perl (except a custom XS). Yes, it looks ugly, but that can be solved by moving the code into a function. ( Benchmark: http://pastebin.com/eByQ4z3S ) – Dummy00001 Nov 16 '15 at 16:56

1 Answers1

0

Best by test:

use integer;
my $la = length($a);
my $r = $la % 8;
my @a = unpack(($r?"a$r":"")."(a8)".($la/8), $a);

There seems no cleaner way to do this efficiently.

Explanation:

use integer; is so that ($la/8) is truncated to integer. int($la/8) would do the same thing.

$r is the "remainder", the amount of remaining string after "dividing" it into chunks of 8.

If the string is evenly divisible by 8 ($r==0) there must be no "remainder" part included in unpack's template, otherwise "a$r": ($r?"a$r":"")

The "quotient", or chunking, part of unpack's template is: "(a8)".($la/8)

The last line can be replaced with the following for cleaner-looking code, at the cost of a couple more variables:

my $q = $la / 8;
my $tr = $r ? "a$r" : "";
my @a = unpack "$tr(a8)$q", $a;
Jeff Y
  • 2,437
  • 1
  • 11
  • 18