I want to can compress a string using the chars ASCII codes. I want to compress them using number patterns. Because ASCII codes are numbers, I want to find sub-patterns in the list of ASCII char codes.
Theory
This will be the format for every pattern I found:
- [nnn][n][nn], where:
- [nnn] is the ASCII code for first char, from group numbers with same pattern.
- [n] is a custom number for a certain pattern/rule (I will explain more below).
- [nn] shows how many times this rule happens.
The number patterns are not concretely established. But let me give you some examples:
- same char
- linear growth (every number/ascii is greater, with one, than previous)
- linear decrease (every number/ascii is smaller, with one, than previous)
Now let's see some situations:
- "adeflk" becomes "097.1.01-100.2.03-108.3.02"
- same char ones, linear growth three times, linear decrease twice.
- "rrrrrrrrrrr" becomes "114.1.11"
- same char eleven times.
- "tsrqpozh" becomes "116.3.06-122.1.01-104.1.01"
- linear decrease six times, same char ones, same char ones.
I added dots ('.') and dashes ('-') so you can see them easily.
Indeed, we don't see good results (compression). I want to use this algorithm for large strings. And adding more rules (number patterns) we increase changes for making shorter result than original. I know the existent compressing solutions. I want this solution because the result have only digits, and it helps me.
What I've tried
// recursive function
function run (string $data, array &$rules): string {
if (strlen($data) == 1) {
// str_pad for having always ASCII code with 3 digits
return (str_pad(ord($data), 3, '0', STR_PAD_LEFT) .'.'. '1' .'.'. '01');
}
$ord = ord($data); // first char
$strlen = strlen($data);
$nr = str_pad($ord, 3, '0', STR_PAD_LEFT); // str_pad for having always ASCII code with 3 digits
$result = '';
// compares every rule
foreach ($rules as $key => $rule) {
for ($i = 1; $i < $strlen; $i++) {
// check for how many times this rule matches
if (!$rule($ord, $data, $i)) {
// save the shortest result (so we can compress)
if (strlen($r = ($nr .'.'. $key .'.'. $i .' - '. run(substr($data, $i), $rules))) < strlen($result)
|| !$result) {
$result = $r;
}
continue 2; // we are going to next rule
}
}
// if comes here, it means entire $data follow this rule ($key)
if (strlen($r = (($nr .'.'. $key .'.'. $i))) < strlen($result)
|| !$result) {
$result = $r; // entire data follow this $rule
}
}
return $result; // it will return the shortest result it got
}
// ASCII compressor
function compress (string $data): string {
$rules = array( // ASCII rules
1 => function (int $ord, string $data, int $i): bool { // same char
return ($ord == ord($data[$i]));
},
2 => function (int $ord, string $data, int $i): bool { // linear growth
return (($ord+$i) == ord($data[$i]));
},
3 => function (int $ord, string $data, int $i): bool { // progressive growth
return ((ord($data[$i-1])+$i) == ord($data[$i]));
},
4 => function (int $ord, string $data, int $i): bool { // linear decrease
return (($ord-$i) == ord($data[$i]));
},
5 => function (int $ord, string $data, int $i): bool { // progressive decrease
return ((ord($data[$i-1])-$i) == ord($data[$i]));
}
);
// we use base64_encode because we want only ASCII chars
return run(base64_encode($data), $rules);
}
I added dots ('.') and dashes ('-') only for testing easily.
Results
compress("ana ar") => "089.1.1 - 087.1.1 - 053.1.1 - 104.1.1 - 073.1.1 - 071.4.2 - 121.1.01"
Which is ok. And it runs fast. Without a problem.
compress("ana aros") => Fatal error: Maximum execution time of 15 seconds exceeded
If string is a bit longer, it gets toooo much. It works fast and normal for 1-7 chars. But when there are more chars in string, that happens.
The algorithm doesn't run perfect and doesn't return the perfect 6-digit pattern, indeed. Before getting there, I'm stucked with that.
Question
How I can increase performance of this backtracking for running ok now and also with more rules?