3

I am trying to clear up an issue that occurs with unflushed file I/O buffers in a couple of programs, in different languages, running on Linux. The solution of flushing buffers is easy enough, but this issue of unflushed buffers happens quite randomly. Rather than seek help on what may cause it, I am interested in how to create (reproduce) and diagnose this kind of situation.

This leads to a two-part question:

  1. Is it feasible to artificially and easily construct instances where, for a given period of time, one can have output buffers that are known to be unflushed? My searches are turning up empty. A trivial baseline is to hammer the hard drive (e.g. swapping) in one process while trying to write a large amount of data from another process. While this "works", it makes the system practically unusable: I can't poke around and see what's going on.

  2. Are there commands from within Linux that can identify that a given process has unflushed file output buffers? Is this something that can be run at the command line, or is it necessary to query the kernel directly? I have been looking at fsync, sync, ioctl, flush, bdflush, and others. However, lacking a method for creating unflushed buffers, it's not clear what these may reveal.

In order to reproduce for others, an example for #1 in C would be excellent, but the question is truly language agnostic - just knowing an approach to create this situation would help in the other languages I'm working in.


Update 1: My apologies for any confusion. As several people have pointed out, buffers can be in the kernel space or the user space. This helped pinpoint the problems: we're creating big dirty kernel buffers. This distinction and the answers completely resolve #1: it now seems clear how to re-create unflushed buffers in either user space or kernel space. Identifying which process ID has dirty kernel buffers is not yet clear, though.

Iterator
  • 20,250
  • 12
  • 75
  • 111
  • 2
    Are you talking about the kernel's I/O buffers or user-space I/O buffers? If it's user-space you're talking about, then the kernel has no knowledge at all about those. – Adam Rosenfield Aug 22 '11 at 03:17
  • @Adam: I had not thought about that. I believe everything is in the kernel I/O buffers. This is true for one program, and the other is much the same (inspired by the first). – Iterator Aug 22 '11 at 03:43
  • @Iterator: You need to figure out which one of these you mean. For kernel buffers, there is no "issue", because all processes on the system share the kernel (and therefore the kernel buffers). So what "issue" are you talking about, exactly? – Nemo Aug 22 '11 at 04:25
  • You can probably externally check whether the `O_SYNC` flag was on in the open call, but nobody ever uses that. Even vi waits till the end to do an fsync, and almost nothing else ever bothers to make sure the buffers get written to disk. – tchrist Aug 22 '11 at 04:30
  • @Nemo: Apologies for any lack of clarity - it's not my area of expertise. I'm not trying to fix the programs per se (explicitly flushing seems to resolve the problems), but the "issue" of encountering unflushed buffers is not yet reproducible in these programs, nor detectable until we examine file output. My interest is more general: to make a clean, simple example of unflushed kernel buffers and learn how to see this externally. The examples given by "caf" appear to be a good starting point for #1, though I need to improve my understanding to be sure. – Iterator Aug 22 '11 at 04:38

3 Answers3

1

A simple program that would have an unflushed buffer would be:

main()
{
      printf("moo");
      pause();
}

Stdio, by default only flushes stdout on newlines, when connected to a terminal.

Dave
  • 10,964
  • 3
  • 32
  • 54
  • Good point. This might not match the underlying cause, but it's a reminder that I need to check the basics, too. :) I wonder if this buffered output can be diagnosed externally. – Iterator Aug 22 '11 at 03:54
  • Probably not, the stdout buffer is all userspace. It's a big (4 or 32K) char array which print and puts append to, until it fills, or is given a newline, whereupon libc does a `write(1, buffer, buffer_fill);`. From an strace point of view, there is no buffer. – Dave Aug 22 '11 at 04:09
  • Thanks. I'd overlooked `strace` - that may help with the 2nd part of my question. – Iterator Aug 22 '11 at 04:25
1

It is very easy to cause unflushed buffers by controlling the receiving side. The beauty of *nix systems is that everything looks like a file, so you can use special files to do what you want. The easiest option is a pipe. If you just want to control stdout, this is the simples option: unflushed_program | slow_consumer. Otherwise, you can use named pipes:

mkfifo pipe_file
unflushed_program --output pipe_file
slow_consumer --input pipe_file

slow_consumer is most likely a program you design to read data slowly, or just read X bytes and stop.

Klox
  • 931
  • 12
  • 21
  • That's an interesting point. However, what if the "slow_consumer" is the filesystem? Slowing down the file I/O has turned out to be quite unpleasant. I tried something like what you're suggesting: make the HD really slooooooooooow, but that makes life unbearable and makes it hard to see what's going on. – Iterator Aug 22 '11 at 04:00
  • On second thought, this `mkfifo` command is particularly intriguing. Maybe I misunderstood your answer. Does this mean that I can artificially slow down the rate at which the first program writes to the file by using `mkfifo` and pipes to introduce a speed regulating "slow_consumer"? – Iterator Aug 22 '11 at 04:03
  • I need to investigate further how I can use this, but I wanted to say that `mkfifo` and named pipes completely solve an entirely unrelated problem I've been banging on for awhile. So, thanks for solving a question that never saw the light of SO! :) – Iterator Aug 22 '11 at 04:13
  • It sounds like you've worked it out, but I thought I'd answer your first question directly: "what if the slow_consumer is the filesystem?" The point is that a pipe just makes the filesystem a conduit, and the slow_consumer is whatever program you want. You aren't likely to use a pipe for a deployed program, but it simulates all of the same behaviors of a slow disk. I'm glad I've been able to help. Just add a comment if you have more questions on this technique. – Klox Aug 22 '11 at 18:24
1

If you are interested in the kernel-buffered data, then you can tune the VM writeback through the sysctls in /proc/sys/vm/dirty_*. In particular, dirty_expire_centisecs is the age, in hundredths of a second, at which dirty data becomes eligible for writeback. Increasing this value will give you a larger window of time in which to do your investigation. You can also increase dirty_ratio and dirty_background_ratio (which are percentages of system memory, defining the point at which synchronous and asynchronous writeback start respectively).

Actually creating dirty pages is easy - just write(2) to a file and exit without syncing, or dirty some pages in a MAP_SHARED mapping of a file.

caf
  • 233,326
  • 40
  • 323
  • 462
  • That looks promising. Is there some way that the presence of kernel-buffered data is visible externally? I suspect your second paragraph describes the mechanism that is occurring, which suggests where we might catch such issues. – Iterator Aug 22 '11 at 04:18
  • @Iterator: You can get a system-wide count of the dirty data with `grep ^Dirty /proc/meminfo`. Note though that exiting with unflushed data in the kernel is **not** generally a problem - that data is immediately visible to other processes, and will eventually be written back (except under *force majure*, like a system crash). – caf Aug 22 '11 at 04:38
  • Are centisecs really tenths of a second (rather than hundredths)? Surely, tenths of a second would be decisecs. – Jonathan Leffler Aug 22 '11 at 08:27
  • @Jonathan Leffler: Err yes, you are quite correct. Fixed, thanks! – caf Aug 22 '11 at 09:16
  • @caf: A follow-up: is it possible to see which process ID is responsible for the dirty pages? Or does the kernel buffer sharing across processes (as nemo mentioned above) imply that I can't attribute it to a particular process? – Iterator Aug 22 '11 at 15:34
  • @Iterator: That's right, dirty pages aren't accounted to a particular process. In many cases the process will have exited already anyway, so it doesn't even *have* a process ID anymore. – caf Aug 22 '11 at 22:47
  • @caf: Argh. That's bad news. I'm not sure if this is contradictory, but it seems that `pmap -x` and `/proc/PID/statm` give some very relevant information regarding dirty pages. I think you've cleared things up & I think monitoring pmap & statm may get me to where I need to be regarding a monitoring solution. Thanks! – Iterator Aug 22 '11 at 23:23
  • @Iterator: Those would be pages dirtied by the process within current memory mappings, but it doesn't count pages dirtied with a plain old `write()` or mappings that have since been unmapped. – caf Aug 22 '11 at 23:25
  • @Iterator: It's still not clear why you should be having a problem - creating "big dirty kernel buffers" should be fine, unless your system frequently crashes. – caf Aug 22 '11 at 23:47
  • @caf: That I can answer: because I have multiple instances of multiple programs running, associating a PID with the dirty buffers helps look into the logs for that instance to see what it was doing. By design, there are many random (as in stochastic) variations between instances. Monitoring them should help improve their efficiency. – Iterator Aug 23 '11 at 00:05