2

my application is logging a shitload of video and i2c sensor data into a disk file - as fast as possible. Currently I am converting everything to bytes and i am writing with a BufferedOutputStream. @Siguza was kind enough to suggest looking into a GZIPOutputStream to accomplish the deed. i was wondering whether you had any thoughts on performance issues pro and con ... i am thinking the processor is way ahead and the disk write is the bottleneck - so i am hoping that compressing on the fly via a GZIPOutputStream before the write might be a good strategy. any thoughts on this greatly welcome.

Added: in response to comments ...

turns out zipping is not that processor expensive ... and the way i had asked the original question was not great, as erwin rightly pointed out. the question about zipping performance is not between a BufferedOutputStream and a GZIPOutputStream ... both zipped and unzipped streams need to be wrapped into a BufferedOutputStream, but how much of a cost is added if the original FileOutputStream is wrapped in a GZIPOutputStream first before it is wrapped in a BufferedOutputStream. here is the answer. I am using code

byte[] bs = RHUtilities.toByteArray((int)1);
boolean zipped = false;

FileOutputStream fos = new FileOutputStream(datFile);
BufferedOutputStream bos = null;
if (zipped) {
    GZIPOutputStream gz = new GZIPOutputStream(fos);
    bos = new BufferedOutputStream(gz);
} else 
    bos = new BufferedOutputStream(fos);
long startT = System.currentTimeMillis();
for (int i=0; i<1000000; i++)
    bos.write(bs);
bos.flush();
System.out.println(System.currentTimeMillis()-startT);
bos.close();

my 2012 macpro laptop does a write of 1M ints with

zipped=true in 38ms - filesize 4MB
zipped=false in 21ms - fileSize 4KB

and, yes, i like the compression :-)

read perfomance is almost identical 83 vs 86ms between

FileInputStream fin = new FileInputStream(datFile);

and

GZIPInputStream gin = new GZIPInputStream(new FileInputStream(datFile));

all good ...

Robert Huber
  • 92
  • 11
  • 1
    What stops you from trying with a test? Anyhow BufferedOutputStream and GZIPOutputStream are entirely different beasts and can be combined. The former optimizes things if you have many small writes, the latter reduces space consumption for data that it can compress - and video is much better compressed with specialized video codecs. They're not comparable. – Erwin Bolwidt Sep 04 '17 at 01:28
  • The question also arises whether you *want* the files compressed or not. Will the downstream system be able to decompress them? – user207421 Sep 04 '17 at 01:48
  • thanks a bunch, i am not compressing the videostreams themselves, just data extracted from it - like the info that goes into a headsup display or which gets overlayed back onto the stream to augment it. data are all bytes, ints, and strings. each individual record is around 20 bytes. 10 min at 10 fps is 400kB as text, 120kB when reduced to the important bytes. 26kB when reduced and zipped. bandwidth savings definitely look great, just the question on how to best optimize performance in this is less clear to me. the decompression step is not a major worry for me as this is less time critical. – Robert Huber Sep 04 '17 at 10:06
  • anyone see any major drawbacks for using GZIPOutputStream for this? much obliged – Robert Huber Sep 04 '17 at 10:09

1 Answers1

0

There are a whole lot of issues raised by this question:

i am thinking the processor is way ahead and the disk write is the bottleneck

"I am thinking" is not a sound basis for optimizing performance. You need to do some measurements to find out where the bottleneck actually is. (If your "thinking" is wrong, then changing to GZipOutputStream is liable to make things worse.)

Alternatively, just try it, and measure whether it improves performance or not.

From a theoretical perspective, if there was a significant mismatch between processor and disc speed then compression could help. And one possible upside is that compression could also save disk space.

But the downsides are:

  • compression is relatively expensive (and so is decompression), so you may end up using more (elapsed) time than you are gaining by reducing I/O
  • compression is ineffective on small files,
  • format-agnostic compression is not very effective on raw (uncompressed) audio or video data1
  • if your video data is already compressed, then a second compression will achieve nothing.

Finally, it could be a "lots of small files" problem. If you attempt to read and write lots of little files, the bottleneck is likely to not be raw disk speed. Rather, it is likely to be the OS's ability to read and write directories and/or file metadata. If that is where your problem is, then you should be looking at bundling the "lots of little files" into archives; e.g. TAR or ZIP files. There are libraries for doing this in Java.

And another benefit of archives is that they can make compression more effective.


1 - For background, read https://en.wikipedia.org/wiki/Lossless_compression and https://en.wikipedia.org/wiki/List_of_codecs#Lossless_video_compression

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Stephen, thanks a bunch, that was exactly what i needed to understand a little more of the pros and cons of different approaches. my files are single, large files consisting of logged int, shorts and strings - 100s of MBs total. and they accumulate from many small writes of 20bytes each. thanks again for giving me a much better idea of what the parameters for my testing need to be. i will get started on the performance testing to see what cost the zipping on the fly imposes and will report what i find. – Robert Huber Sep 04 '17 at 13:45