40

I need to write a function that takes in some kind of input stream thing (e.g. an InputStream or a FileChannel) in order to read a large file in two passes: once to precompute some capacities, and second to do the "real" work. I do not want the whole file loaded into memory at once (unless it is small).

Is there an appropriate Java class that provides this capability? FileInputStream itself does not support mark()/reset(). BufferedInputStream does, I think, but I'm not clear whether it has to store the whole file to do this.

C is so simple, you just use fseek(), ftell(), and rewind(). :-(

animuson
  • 53,861
  • 28
  • 137
  • 147
Jason S
  • 184,598
  • 164
  • 608
  • 970
  • 5
    Jason, please un-accept my answer and take [this one.](http://stackoverflow.com/a/18665678/3474) It's good because it provides an efficient implementation of the standard markable `InputStream` API; any consumer of `InputStream` can use it without loading the whole file. – erickson May 02 '16 at 02:31

9 Answers9

30

I think the answers referencing a FileChannel are on the mark .

Here's a sample implementation of an input stream that encapsulates this functionality. It uses delegation, so it's not a true FileInputStream, but it is an InputStream, which is usually sufficient. One could similarly extend FileInputStream if that's a requirement.

Not tested, use at your own risk :)

public class MarkableFileInputStream extends FilterInputStream {
    private FileChannel myFileChannel;
    private long mark = -1;

    public MarkableFileInputStream(FileInputStream fis) {
        super(fis);
        myFileChannel = fis.getChannel();
    }

    @Override
    public boolean markSupported() {
        return true;
    }

    @Override
    public synchronized void mark(int readlimit) {
        try {
            mark = myFileChannel.position();
        } catch (IOException ex) {
            mark = -1;
        }
    }

    @Override
    public synchronized void reset() throws IOException {
        if (mark == -1) {
            throw new IOException("not marked");
        }
        myFileChannel.position(mark);
    }
}
ykaganovich
  • 14,736
  • 8
  • 59
  • 96
  • 3
    This is by far the best solution. BufferedInput causes large portions of, or potentially ALL of a file to be double buffered. That is a huge waste. And RandomAccessFile doesn't inherit from InputStream, so can't be a drop in substitute for where ever you are already using streams. This little class however should be extremely fast and memory efficient. – Adam Jun 03 '14 at 18:43
  • 1
    This worked great for me. I added a `mark(0);` to the constructor because I was getting a "not marked" error on the first call to `reset()` and, at least in my case, it makes sense for the default reset position to be 0. – Fernando Correia Nov 01 '14 at 00:34
  • 2
    This solution works great, with one small change. I would remove the "mark = -1" inside the reset method. The javadocs for reset give no indication that it should reset the mark, only the position. This allows mark to be called once, and then reset to be called multiple times, for example, when performing multiple retries. – Derek Lewis Feb 12 '15 at 19:30
  • 7
    Probably retoric question: Why is this not the default behavior of FileInputStream? – stippi May 08 '19 at 08:30
25

BufferedInputStream supports mark by buffering the content in memory. It is best reserved for relatively small look-aheads of a predictable size.

Instead, RandomAccessFile can be used directly, or it could serve as the basis for a concrete InputStream, extended with a rewind() method.

Alternatively, a new FileInputStream can be opened for each pass.

erickson
  • 265,237
  • 58
  • 395
  • 493
  • 2
    I'm switching to this answer, because I need to use an interface that I can share between regular files and in-memory buffers. Grrrrr. I'm writing my own interface RewindableStream + implementation classes, one of which wraps RandomAccessFile. – Jason S Jul 13 '09 at 18:14
  • 1
    I've searched on Google and SO there're saying about performance of RandomAccessFile, how do you think of this? – dotrinh PM Jul 12 '22 at 11:38
  • 1
    I mean what do you think of the performance of FileInputStream vs RAF? – dotrinh PM Jul 12 '22 at 17:16
  • 2
    @dotrinh `RandomAccessFile` can do things `FileInputStream` can't. But, if used like an input stream, sticking with calls to [`read(byte[])`,](https://docs.oracle.com/javase/8/docs/api/java/io/RandomAccessFile.html#read-byte:A-) my guess is that they will perform identically because they are making identical system calls. That would be the most reasonable implementation, and the documentation hints at that too. It would be easy to benchmark and know for certain if needed. – erickson Jul 12 '22 at 18:54
23

If you get the associated FileChannel from the FileInputStream, you can use the position method to set the file pointer to anywhere in the file.

FileInputStream fis = new FileInputStream("/etc/hosts");
FileChannel     fc = fis.getChannel();


fc.position(100);// set the file pointer to byte position 100;
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
SW-Eng
  • 239
  • 2
  • 2
8

java.nio.channels.FileChannel has a method position(long) to reset the position back to zero like fseek() in C.

Arne Burmeister
  • 20,046
  • 8
  • 53
  • 94
8

RandomAccessFile is what you want:

dfa
  • 114,442
  • 31
  • 189
  • 228
2

Check out java.io.RandomAccessFile

Rich Seller
  • 83,208
  • 23
  • 172
  • 177
  • 1
    ok, thanks. looks like I can use it to open the file and then use FileChannel as the class to manipulate/read/write it. – Jason S Jul 07 '09 at 20:39
  • 1
    Too bad RandomAccessFile doesn't implement InputStream with its mark()/reset() methods. >:( – Jason S Jul 07 '09 at 20:43
  • You can roll your own fairly easily (if not that elegantly), see http://www.coderanch.com/t/277378/Streams/java/InputStream-RandomAccessFile-best-way for an example – Rich Seller Jul 07 '09 at 20:53
  • thanks but it's the other direction (accessing a RandomAccessFile as an InputStream). FileChannel is an OK class to pass around in my interface. – Jason S Jul 07 '09 at 21:17
2

BufferedInputStream has mark(readlimit) and reset(). readlimit should be larger than filesize to make mark valid. file.length()+1 is OK. This means mark is valid until readlimit bytes are read, thus you can go back by reset().

Echilon
  • 10,064
  • 33
  • 131
  • 217
Byung Ahn
  • 21
  • 1
2

What you want is RandomAccessFileInputStream - implements InputStream interface with mark/reset, sometimes seek based on RandomAccessFiles. Some implementations exist which might do what you need.

One example complete with sources is in http://www.fuin.org/utils4j/index.html but you would find many others searching the internet and its is easy enough to code if none fits exactly.

janisz
  • 6,292
  • 4
  • 37
  • 70
2

PushbackInputStream will also work, as long as you know how many characters you want to be able to rewind