I'm working on a reader/writer for DNG/TIFF files. As there are several options to work with files in general (FileInputStream
, FileChannel
, RandomAccessFile
), I'm wondering which strategy would fit my needs.
A DNG/TIFF file is a composition of:
- some (5-20) small blocks (several tens to hundred bytes)
- very few (1-3) big continuous blocks of image data (up to 100 MiB)
- several (maybe 20-50) very small blocks (4-16 bytes)
The overall file size ranges from 15 MiB (compressed 14 bit raw data) up to about 100 MiB (uncompressed float data). The number of files to process is 50-400.
There are two usage patterns:
- Read all meta-data from all files (everything except the image data)
- Read all image data from all files
I'm currently using a FileChannel
and performing a map()
to obtain a MappedByteBuffer
covering the whole file. This seems quite wasteful if I'm just interested in reading the meta-data. Another problem is freeing the mapped memory: When I pass slices of the mapped buffer around for parsing etc. the underlying MappedByteBuffer
won't be collected.
I now decided to copy smaller chunks of FileChannel
using the several read()
-methods and only map the big raw-data regions. The downside is that reading a single value seems extremely complex, because there's no readShort()
and the like:
short readShort(long offset) throws IOException, InterruptedException {
return read(offset, Short.BYTES).getShort();
}
ByteBuffer read(long offset, long byteCount) throws IOException, InterruptedException {
ByteBuffer buffer = ByteBuffer.allocate(Math.toIntExact(byteCount));
buffer.order(GenericTiffFileReader.this.byteOrder);
GenericTiffFileReader.this.readInto(buffer, offset);
return buffer;
}
private void readInto(ByteBuffer buffer, long startOffset)
throws IOException, InterruptedException {
long offset = startOffset;
while (buffer.hasRemaining()) {
int bytesRead = this.channel.read(buffer, offset);
switch (bytesRead) {
case 0:
Thread.sleep(10);
break;
case -1:
throw new EOFException("unexpected end of file");
default:
offset += bytesRead;
}
}
buffer.flip();
}
RandomAccessFile
provides useful methods like readShort()
or readFully()
, but cannot handle little endian byte order.
So, is there an idiomatic way to handle scattered reads of single bytes and huge blocks? Is memory-mapping an entire 100 MiB file to just read a few hundred bytes wasteful or slow?