41

How can data written to a file really be flushed/synced with the block device by Java.

I tried this code with NIO:

FileOutputStream s = new FileOutputStream(filename)
Channel c = s.getChannel()
while(xyz)
    c.write(buffer)
c.force(true)
s.getFD().sync()
c.close()

I supposed that c.force(true) togehter with s.getFD().sync() should be sufficient because the doc for force states

Forces any updates to this channel's file to be written to the storage device that contains it. If this channel's file resides on a local storage device then when this method returns it is guaranteed that all changes made to the file since this channel was created, or since this method was last invoked, will have been written to that device. This is useful for ensuring that critical information is not lost in the event of a system crash.

The documentation to sync states:

Force all system buffers to synchronize with the underlying device. This method returns after all modified data and attributes of this FileDescriptor have been written to the relevant device(s). In particular, if this FileDescriptor refers to a physical storage medium, such as a file in a file system, sync will not return until all in-memory modified copies of buffers associated with this FileDesecriptor have been written to the physical medium. sync is meant to be used by code that requires physical storage (such as a file) to be in a known state.

These two calls should be sufficient. Is it? I guess they aren't.

Background: I do a small performance comparison (2 GB, sequential write) using C/Java and the Java version is twice as fast as the C version and probably faster than the hardware (120 MB/s on a single HD). I also tried to execute the command line tool sync with Runtime.getRuntime().exec("sync") but that hasn't changed the behavior.

The C code resulting in 70 MB/s is (using the low level APIs (open,write,close) doesn't change much):

FILE* fp = fopen(filename, "w");
while(xyz) {
    fwrite(buffer, 1, BLOCK_SIZE, fp);
}
fflush(fp);
fclose(fp);
sync();

Without the final call to sync; I got unrealistical values (over 1 GB aka main memory performance).

Why is there such a big difference between C and Java? There are two possiblities: I doesn't sync the data correctly in Java or the C code is suboptimal for some reason.

Update: I have done strace runs with "strace -cfT cmd". Here are the results:

C (Low-Level API): MB/s 67.389782

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 87.21    0.200012      200012         1           fdatasync
 11.05    0.025345           1     32772           write
  1.74    0.004000        4000         1           sync

C (High-Level API): MB/s 61.796458

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 73.19    0.144009      144009         1           sync
 26.81    0.052739           1       65539           write

Java (1.6 SUN JRE, java.io API): MB/s 128.6755466197537

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 80.07  105.387609        3215     32776           write
  2.58    3.390060        3201      1059           read
  0.62    0.815251      815251         1           fsync

Java (1.6 SUN JRE, java.nio API): MB/s 127.45830221558376

  5.52    0.980061      490031         2           fsync
  1.60    0.284752           9     32774           write
  0.00    0.000000           0        80           close

The time values seem to be system time only and are therefore pretty meaningless.

Update 2: I switched to another server, rebooted, and I use a fresh formatted ext3. Now I get only 4% differences between Java and C. I simply don't know what went wrong. Sometimes things are strange. I should have tried the measurement with another system before writing this question. Sorry.

Update 3: To summarize the answers:

  • Use c.force(true) followed by s.getFD().sync() for Java NIO and s.flush() and s.getFD().sync() for Java's stream API. For the High-Level API in C don't forget to sync. A fflush submitted the data to the OS, but doesn't bring your data to the block device.
  • Use strace to analyze the syscalls done by a command
  • Cross check your results before posting a question.

Update 4: Please note the following follow-up question.

Community
  • 1
  • 1
dmeister
  • 34,704
  • 19
  • 73
  • 95
  • 1
    I'd really like to see throughput using just section 2 functions. – Charlie Martin Apr 09 '09 at 16:35
  • What are you using for BLOCK_SIZE? Is it the same size as your buffer in Java? 512 is going to be very suboptimal these days. You'd probably want at least 4096 (page size on x86) or possibly higher. I've seen measurable improvements up to 32k on some machines. Oh, and of course if your buffer is page-aligned it will give the kernel more room for optimization. – aij Mar 12 '14 at 23:59
  • Another possible issue is the code you posted is not using the "low level APIs (open,write,close)". It's using the higher level, portable stdio API (fopen,fwrite,fclose) which will add an extra layer of buffering by default. Did you explicitly turn off buffering somewhere outside the code you posted? – aij Mar 13 '14 at 00:13

5 Answers5

12

Actually, in C you want to just call fsync() on the one file descriptor, not sync() (or the "sync" command) which signals the kernel to flush all buffers to disk system-wide.

If you strace (getting Linux-specific here) the JVM you should be able to observe an fsync() or fdatasync() system call being made on your output file. That would be what I'd expect the getFD().sync() call to do. I assume c.force(true) simply flags to NIO that fsync() should be called after each write. It might simply be that the JVM you're using doesn't actually implement the sync() call?

I'm not sure why you weren't seeing any difference when calling "sync" as a command: but obviously, after the first sync invocation, subsequent ones are usually quite a lot faster. Again, I'd be inclined to break out strace (truss on Solaris) as a "what's actually happening here?" tool.

Arunprasanth K V
  • 20,733
  • 8
  • 41
  • 71
araqnid
  • 127,052
  • 24
  • 157
  • 134
  • The idea of tracing the syscalls is good. I will do it tomorrow. – dmeister Apr 08 '09 at 20:05
  • 1
    force() calls fsync or fdatasync (depending on the metadata flag). However, it doesn't set a state to call fsync/fdatasync directly after each call. I looked it up in the OpenJDK source code. – dmeister Apr 09 '09 at 09:40
5

It is a good idea to use the synchronized I/O data integrity completion. However your C sample is using the wrong method. You use sync(), which is used to sync the whole OS.

If you want to write the blocks of that single file to disk, you need to use fsync(2) or fdatasync(2) in C. BTW: when you use buffered stdio in C (or a BufferedOutputStream or some Writer in Java) you need to flush both first before you sync.

The fdatasync() variant is a bit more efficient if the file has not changed name or size since you sync. But it might also not persit all the meta data. If you want to write your own transactional safe database systems, you need to observe some more stuff (like fsyncing the parent directory).

eckes
  • 10,103
  • 1
  • 59
  • 71
1

You need to tell us more about the hardware and operating system, also the specific Java version. How are you measuring this throughput?

You're correct that force/sync should force the data out to the physical media.


Here's a raw version of copy. Compiled with gcc 4.0 on an Intel Mac, should be clean.

/* rawcopy -- pure C, system calls only, copy argv[1] to argv[2] */

/* This is a test program which simply copies from file to file using
 * only system calls (section 2 of the manual.)
 *
 * Compile:
 *
 *      gcc -Wall -DBUFSIZ=1024 -o rawcopy rawcopy.c
 *
 * If DIRTY is defined, then errors are interpreted with perror(3).
 * This is ifdef'd so that the CLEAN version is free of stdio.  For
 * convenience I'm using BUFSIZ from stdio.h; to compile CLEAN just
 * use the value from your stdio.h in place of 1024 above.
 *
 * Compile DIRTY:
 *
 *      gcc -DDIRTY -Wall -o rawcopy rawcopy.c
 *
 */
#include <fcntl.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <stdlib.h>
#include <unistd.h>
#if defined(DIRTY)
#   if defined(BUFSIZ)
#       error "Don't define your own BUFSIZ when DIRTY"
#   endif
#   include <stdio.h>
#   define PERROR perror(argv[0])
#else
#   define CLEAN
#   define PERROR
#   if ! defined(BUFSIZ)
#       error "You must define your own BUFSIZ with -DBUFSIZ=<number>"
#   endif
#endif

char * buffer[BUFSIZ];          /* by definition stdio BUFSIZ should
                                   be optimal size for read/write */

extern int errno ;              /* I/O errors */

int main(int argc, char * argv[]) {
    int fdi, fdo ;              /* Input/output file descriptors */
    ssize_t len ;               /* length to read/write */
    if(argc != 3){
        PERROR;
        exit(errno);
    }

    /* Open the files, returning perror errno as the exit value if fails. */
    if((fdi = open(argv[1],O_RDONLY)) == -1){
        PERROR;
        exit(errno);
    }
    if((fdo = open(argv[2], O_WRONLY|O_CREAT)) == -1){
        PERROR;
        exit(errno);
    }

    /* copy BUFSIZ bytes (or total read on last block) fast as you
       can. */
    while((len = read(fdi, (void *) buffer, BUFSIZ)) > -1){
        if(len == -1){
            PERROR;
            exit(errno);
        }
        if(write(fdo, (void*)buffer, len) == -1){
            PERROR;
            exit(errno);
        }
    }
    /* close and fsync the files */
    if(fsync(fdo) ==-1){
        PERROR;
        exit(errno);
    }
    if(close(fdo) == -1){
        PERROR;
        exit(errno);
    }
    if(close(fdi) == -1){
        PERROR;
        exit(errno);
    }

    /* if it survived to here, all worked. */
    exit(0);
}
Charlie Martin
  • 110,348
  • 25
  • 193
  • 263
  • IcedTea OpenJDK 1.6 Java, openSUSE 11 Linux, 4 Core-CPU, 4 GB, 1 SATA-HD over FiberChannel from a JBOD. – dmeister Apr 08 '09 at 20:04
  • I wrote a 4 GB file using 64K blocks of the same random data and measured the time between file open and file close (and sync if it is done). – dmeister Apr 08 '09 at 20:07
  • Any other workload? The C was with GCC > 4? That configuration's similar to one I've tried at STK (RIP) and 120 MB/s sounds pretty plausible. – Charlie Martin Apr 08 '09 at 20:10
  • Yes, GCC 4.3.2. I plan to evaluate random io next and to add python and Erlang to the list of evaluated languages. – dmeister Apr 09 '09 at 09:14
  • I switched to Suns JRE 1.6.0, but the behavior is very similar. – dmeister Apr 09 '09 at 09:15
0

(I know this is a very late reply, but I ran into this thread doing a Google search, and that's probably how you ended up here too.)

Your calling sync() in Java on a single file descriptor, so only that buffers related to that one file get flushed out to disk.

In C and command-line, you're calling sync() on the entire operating system - so every file buffer gets flushed out to disk, for everything your O/S is doing.

To be comparable, the C call should be to syncfs(fp);

From the Linux man page:

   sync() causes all buffered modifications to file metadata and data to
   be written to the underlying file systems.

   syncfs() is like sync(), but synchronizes just the file system contain‐
   ing file referred to by the open file descriptor fd.
Adam Fanello
  • 79
  • 2
  • 4
  • 2
    syncfs() is not better then sync(), both is wrong. The fdatasync() call is the one which java uses and the one you want to use in C. – eckes May 23 '14 at 20:12
-1

The C code could be suboptimal, because it uses stdio rather than raw OS write(). But then, java could be more optimal because it allocates larger buffers?

Anyway, you can only trust the APIDOC. The rest is beyond your duties.

Ingo
  • 36,037
  • 5
  • 53
  • 100