I have written a benchmarking script for GZIP.
It is not totally representative:
- does not use exactly the same compressor/decompressor implementation that the Java program would be using
- does not operate under the exact same runtime conditions as the Java program
- does not test a variety of strings
- does not test a variety of string sizes
Nevertheless, it gives a useful heuristic.
plain="2016-11-09 20:56:02,469 ERROR [main] c.s.HelloExample - Something wrong with customer 'CUS-123e4567-e89b-12d3-a456-42665544'"
echo "Log message which we are using for this test is:"
echo $plain
echo "Time taken to echo this string 10,000 times:"
time for a in $(seq 1 10000);
do
echo $plain > /dev/null
done
echo "Time taken to echo and compress this string 10,000 times:"
time for a in $(seq 1 10000);
do
echo $plain | gzip -cf > /dev/null
done
echo "Time taken to echo, compress and decompress this string 10,000 times:"
time for a in $(seq 1 10000);
do
echo $plain | gzip -cf | gzip -cfd > /dev/null
done
Here's how the measurements came out:
Log message which we are using for this test is:
2016-11-09 20:56:02,469 ERROR [main] c.s.HelloExample - Something wrong with customer 'CUS-123e4567-e89b-12d3-a456-42665544'
Time taken to echo this string 10,000 times:
real 0m1.940s
user 0m0.591s
sys 0m1.333s
user+sys 0m1.924s
Time taken to echo and compress this string 10,000 times:
real 0m22.028s
user 0m11.309s
sys 0m17.325s
user+sys 0m28.634s
Time taken to echo, compress and decompress this string 10,000 times:
real 0m22.983s
user 0m18.761s
sys 0m27.322s
user+sys 0m46.083s
[Finished in 47.0s real time]
User+sys shows how much CPU time was used; that's the bit that is important for working out how computationally intensive this is.
So, compression takes about 14.9x more computation than just echoing the string raw.
Compression + decompression takes 24.0x more computation than just echoing the string raw. This is only 1.6x more computation than compressing.
Conclusions:
- It's not cheap to compress even a tiny file in GZIP.
- GZIP decompression is cheap!
Caution: this test may have, in reality, been measuring startup and cleanup costs of the gzip executable. I am not sure if those are significant, but certainly we can see that it is a threaded application (user + sys < real). So I could imagine setup overhead such as starting pthreads.
I was not able to find any conclusive answer for what the time complexity of GZIP is with respect to the size of the input. But it would be interesting to know.