9

Our build is annoyingly slow. It's a Java system built with Ant, and I'm running mine on Windows XP. Depending on the hardware, it can take between 5 to 15 minutes to complete.

Watching overall performance metrics on the machine, as well as correlating hardware differences with build times, indicates that the process is I/O bound. It also shows that the process does a lot more reading than writing.

However, I haven't found a good way to determine which files are being read or written, and how many times. My suspicion is that with our many subprojects and subsequent invocations of the compiler, the build is re-reading the same commonly used libraries many times.

What are some profiling tools that will tell me what a given process is doing with which files? Free is nice, but not essential.


Using Process Monitor, as suggested by Jon Skeet, I was able to confirm my suspicion: almost all of the disk activity was reading and re-reading of libraries, with the JDK's copies of "rt.jar" and other libraries at the top of the list. I can't make a RAM disk large enough to hold all the libraries I used, but mounting the "hottest" libraries on a RAM disk cut build times by about 40%; clearly, Windows file system caching isn't doing a good enough job, even though I've told Windows to optimize for that.

One interesting thing I noticed is that the typical 'read' operation on a JAR file is just a few dozen bytes; usually there are two or three of these, followed by a skip several kilobytes further on in the file. It appeared to be ill-suited to bulk reads.

I'm going to do more testing with all of my third-party libraries on a flash drive, and see what effect that has.

Community
  • 1
  • 1
erickson
  • 265,237
  • 58
  • 395
  • 493
  • One quick question erickson, how did you figure out how many bytes are being read with the ProcessMonitor? I'm having the same problem trying to profile our builds with Windows XP – Alex. S. Oct 24 '12 at 14:23
  • Just figured out now, in the Detail column for ReadFile operations, for example, it says Offset: N bytes, Length: M bytes, and so on. – Alex. S. Oct 24 '12 at 14:27

5 Answers5

7

If you only need it for Windows, SysInternals Process Monitor should show you everything you need to know. You can select the process, then see each operation as it goes and get a summary of file operation as well.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Thanks John. I've used Process Explorer in the past. Is this a successor to that product, or something completely separate? – erickson Jan 29 '09 at 20:15
  • Process Explorer is sort of task manager alternative. Process Monitor shows you every I/O operation like opening file, writing to registry etc... – lacop Jan 29 '09 at 20:25
1

Back when I still used Windows I used to get good results speeding my build up by having all build output written to a separate partition if maybe 3 GB in size, and periodically formatting that at night once a week via a scheduled task. It's just build output, so it doesn't matter if it gets unilaterally flattened occasionally.

But honestly, since moving to Linux, disk fragmentation is something I never worry about any more.

Another reason to try your build on Linux, at least once, is so that you can run strace (grepped for calls to open) to see what files your build is touching.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Ben Hardy
  • 1,739
  • 14
  • 16
  • 1
    Procmon/Filemon give similar (actually) information to strace. I was able to see every open, meta-data query, read, and write operation. – erickson Jan 30 '09 at 22:23
1

An oldie but a goodie: create a RAM disk and compile your files from there.

Jeffrey Fredrick
  • 4,493
  • 1
  • 25
  • 21
0

I used to build a massive Java webapp (JSP frontend) using Ant on Windows and it would take upwards of 3 minutes. I wiped my computer and installed Linux, and suddenly the builds took 18 seconds. Those are real numbers, albeit about 3 years old. I can only assume that Java prefers the Linux memory management and threading models to the Windows equivalents, as all Java programs appear to run better under Linux in my experience (especially Eclipse). Linux seems a lot better about preventing extra reads from the disk when you're doing a lot of reading of files that haven't changed (i.e. exectuables and libraries). This may be a property of the disk cache or the filesystem, I'm not sure which.

One of the great things about Java is that it's cross-platform, so setting up a Linux-based build server is actually an option for you. Being something of a Linux evangelist, I'd of course prefer to see you switch your dev environment to Linux, but I know that a lot of people don't want to do that (or can't for practical reasons).

If you're not willing to even set up a Linux build server to see if it runs faster, you could at least try defragmenting your Windows machine's hard drive. That makes a huge difference for C++ builds on my work computer. Try JkDefrag, which seems a lot better than the defragmenter that comes with Windows.

EDIT: I'd assume I got a downvote because my answer doesn't address the exact question asked. It is, however, in the tradition of StackOverflow to help people fix their real problem, not just treat the symptoms. I'm not one of those people for whom the answer to every question is "use linux". In this instance, however, I have very real, measured performance gains in exactly the situation the OP is asking about, so I thought it worth sharing my experiences.

rmeador
  • 25,504
  • 18
  • 62
  • 103
  • while I don't doubt switching to linux would improve performance, this is hardly an answer to a question regarding profiling IO on windows – sgibbons Jan 29 '09 at 21:23
  • Thanks rmeador. A lot of our developers do run Linux, and it does help. Its file system cache seems to be much better than Windows'. There's also some suspicion that Microsoft has deliberately hobbled performance of kernel calls by non-M$ code. ;) However, even Linux builds are too slow. – erickson Jan 29 '09 at 22:39
0

Actually FileMon is a more direct tool than ProcMon. In general, when running performance analysis for disk I/O, consider the following two:

  • Throughput (speed of read/write of bytes per second)
  • Latency (how much in waiting in the queue for read/write)

Once you evaluate the performance of your system in terms of the above, it is easy to identify the bottleneck and take corrective action: get faster disks or change your code (whichever works out cheaper).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Sesh
  • 5,993
  • 4
  • 30
  • 39