I wrote a Java compressor for a super obscure compression format. (It was mainly used on Amiga Computers in the 1990s).
There is a fair amount of documentation on how to decompress the file format, but none on actually how to compress it.
So, I tried to make it myself. It works, but there's one issue. It takes me 42 seconds to compress all the files I want to compress, on "low-intensity settings". It takes roughly 10x that time on higher intensity settings.
I believe it can be much faster than that.
It's basically a Lz77 sliding-window variant.
The real bottleneck is searching for an existing occurance to compress with.
Right now, I'm using a Map<Byte, List<Integer>>
(The List<Integer>
is all the indices the byte is present at.)
To find a potential match, what it does is:
It takes the current index of the file being compressed.
It gets the List<Integer>
from the Map with the byte at the current index.
It finds the longest sub-list of bytes that already occurs in the file by using that List, and just checking how long they match for.
I think a better data structure could speed this up significantly, but I'm stuck at this point.
One of the restrictions I have to work with is that I need to strictly adhere to this ancient compression format due to the nature of what this program is for.
How could I optimize the compression without making it less effective at packing data?
Main Bottleneck code:
private static int search(byte[] data, int bufferEnd, List<Byte> target, Map<Byte, List<Integer>> dictionary) {
int minIndex = Math.max(0, bufferEnd - getMaximumOffset(target.size())); // There's a certain point at which data will not be compressed. By calculating it here, it saves a lot of overheard, and prevents this from becoming O(n^2)
byte test = target.get(0);
if (!dictionary.containsKey(test))
return -1; // No results found.
List<Integer> possibleResults = dictionary.get(test);
for (int i = possibleResults.size() - 1; i >= 0; i--) {
int testIndex = possibleResults.get(i);
if (minIndex > testIndex)
break; // We've gone too far.
// Test this
boolean pass = true;
for (int j = 1; j < target.size(); j++) {
if (target.get(j) != data[j + testIndex]) {
pass = false;
break; // Break from the j for loop.
}
}
if (pass) // A match has been found. Return it.
return testIndex;
}
return -1;
}
Which is called by:
while ((tempIndex = search(data, i, searchList, dictionary)) >= 0) { // Find the longest compressable bunch of characters.
if (data.length - 1 == readIndex) // If we've reached the end of the data, exit.
break;
searchList.add(data[++readIndex]);
}
Full Code here for anyone who needs it.