For comparsion, I tried to test different versions for solving your actual problem of compressing the string. Here is my test script test.pl
:
use strict;
use warnings;
use Benchmark qw(cmpthese);
use Inline C => './compress_c.c';
my $str_len = 10000;
my @chars = qw(a b c d);
my $str;
$str .= [@chars]->[rand 4] for 1 .. $str_len;
cmpthese(
-1,
{
compress_array => sub { compress_array( $str ) },
compress_regex => sub { compress_regex( $str ) },
compress_str => sub { compress_str( $str ) },
compress_c => sub { compress_c( $str ) },
}
);
# Suggested by @melpomene in the comments
sub compress_regex {
return $_[0] =~ s/([a-z])\1+/($+[0] - $-[0]) . $1/egr;
}
sub compress_array {
my $result = '';
my @chrs = split //, $_[0];
my $prev = $chrs[0];
my $count = 1;
my @result;
for my $i ( 1..$#chrs ) {
my $char = $chrs[$i];
if ( $prev eq $char ) {
$count++;
next if $i < $#chrs;
}
if ( $count > 1) {
push @result, $count, $prev;
}
else {
push @result, $prev;
}
if ( ( $i == $#chrs ) and ( $prev ne $char ) ) {
push @result, $char;
last;
}
$count = 1;
$prev = $char;
}
return join '', @result;
}
sub compress_str {
my $result = '';
my $prev = substr $_[0], 0, 1;
my $count = 1;
my $lastind = (length $_[0]) - 1;
for my $i (1 .. $lastind) {
my $char = substr $_[0], $i, 1;
if ( $prev eq $char ) {
$count++;
next if $i < $lastind;
}
if ( $count > 1) {
$result .= $count;
}
$result .= $prev;
if ( ( $i == $lastind ) and ( $prev ne $char ) ) {
$result .= $char;
last;
}
$count = 1;
$prev = $char;
}
return $result;
}
where compress_c.c
is:
SV *compress_c(SV* str_sv) {
STRLEN len;
char* str = SvPVbyte(str_sv, len);
SV* result = newSV(len);
char *buf = SvPVX(result);
char prev = str[0];
int count = 1;
int j = 0;
int i;
for (i = 1; i < len; i++ )
{
char cur = str[i];
if ( prev == cur ) {
count++;
if ( i < (len - 1) )
continue;
}
if ( count > 1) {
buf[j++] = count + '0'; // assume count is less than 10
}
buf[j++] = prev;
if ( (i == (len - 1)) && (prev != cur) ) buf[j++] = cur;
count = 1;
prev = cur;
}
buf[j] = '\0';
SvPOK_on(result);
SvCUR_set(result, j);
return result;
}
The result of running perl test.pl
:
Rate compress_array compress_str compress_regex compress_c
compress_array 311/s -- -42% -45% -99%
compress_str 533/s 71% -- -6% -98%
compress_regex 570/s 83% 7% -- -98%
compress_c 30632/s 9746% 5644% 5273% --
Which shows that regex version is slightly faster than the string version. However, the C version is the fastest, and it is about 50 times as fast as the regex version.
Note: I tested this on my Ubuntu 16.10 laptop (Intel Core i7-7500U CPU @ 2.70GHz)