16

I want to copy the last 10MB of a possibly large file into another file. Ideally I would use FileInputStream, skip() and then read(). However I'm unsure if the performance of skip() will be bad. Is skip() typically implemented using a file seek underneath or does it actually read and discard data?

I know about RandomAccessFile but I'm interested in whether I could use FileInputStream in place of that (RandomAccessFile is annoying as the API is non-standard).

Mike Q
  • 22,839
  • 20
  • 87
  • 129

2 Answers2

19

Depends on your JVM, but here's the source for FileInputStream.skip() for a recent openjdk:

JNIEXPORT jlong JNICALL
Java_java_io_FileInputStream_skip(JNIEnv *env, jobject this, jlong toSkip) {
    jlong cur = jlong_zero;
    jlong end = jlong_zero;
    FD fd = GET_FD(this, fis_fd);
    if (fd == -1) {
        JNU_ThrowIOException (env, "Stream Closed");
        return 0;
    }
    if ((cur = IO_Lseek(fd, (jlong)0, (jint)SEEK_CUR)) == -1) {
        JNU_ThrowIOExceptionWithLastError(env, "Seek error");
    } else if ((end = IO_Lseek(fd, toSkip, (jint)SEEK_CUR)) == -1) {
        JNU_ThrowIOExceptionWithLastError(env, "Seek error");
    }
    return (end - cur);
}

Looks like it's doing a seek(). However, I don't see why RandomAccessFile is non-standard. It's part of the java.io package and has been since 1.0.

The Alchemist
  • 3,397
  • 21
  • 22
  • 1
    Thanks. When I say RandomAccessFile is non standard it isn't nor does it provide a way of getting an InputStream that utility libraries typically expect. Probably just the nature of what an RAF is. – Mike Q Sep 09 '10 at 11:13
  • My problem is after call skip() i use filechannel from fis.getChannel() to get charsequence to be applied to regex. Unfortunately the filechannel just restore back the skipped input – stackunderflow Oct 19 '14 at 11:14
  • 1
    The problem with this is that we have to rely on that particular implementation to deduce the fact that it does a seek since AFAIK the information is not part of the interface/documentation. That is not a good idea. But maybe "discarding the skipped bytes" as it is stated means seeking over them to the desired position. – Ludovic Kuty Oct 06 '16 at 09:02
0

you will be interested with this LINK

it say that seek is faster than skip

Bilel Boulifa
  • 216
  • 2
  • 5