13

We have an application that imports a large amount of files by splitting the data and sorting it. When running the JUnit test case, the whole process takes about 16 minutes.

Same test, done with mvn clean test -Dtest=MyTest run in 34 minutes.

We are calling in to /bin/sort to sort the files. The sort seems to be taking longer. I don't understand what is different.

Looking at IntelliJ it runs with

/Library/Java/JavaVirtualMachines/1.6.0_26-b03-383.jdk/Contents/Home/bin/java -Didea.launcher.port=7532 -Didea.launcher.bin.path=/Applications/IntelliJ IDEA 10.app/bin -Dfile.encoding=UTF-8 -classpath %classhpath% com.intellij.rt.execution.application.AppMain com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 -junit4 xxx.IntTestImportProcess,testImportProcess

I am on OS X. All the classes are injected using Spring. What are some possible suggestions are theories at what is behind this performance gain in IntelliJ? The tests are identical. I can't share all of the code because there is just so much. But I can add any detail if requested.

Here is my main class and how I am running both.

public static void main(String... args) throws IOException {
        if(args.length != 2) {
            System.out.println("Usage: \n  java -jar client.jar spring.xml data_file");
            System.exit(1);
        }
        ApplicationContext applicationContext = new FileSystemXmlApplicationContext(args[0]);
        PeriodFormatter formatter = new PeriodFormatterBuilder()
                .appendMinutes()
                .appendSuffix("minute", "minutes")
                .appendSeparator(" and ")
                .appendSeconds()
                .appendSuffix("second", "seconds")
                .toFormatter();
        URI output = (URI) applicationContext.getBean("workingDirectory");
        File dir = new File(output);
        if(dir.exists()) {
            Files.deleteDirectoryContents(dir.getCanonicalFile());
        }
        else {
            dir.mkdirs();
        }
        ImportProcess importProcess = applicationContext.getBean(ImportProcess.class);
        long start = System.currentTimeMillis();
        File file = new File(args[1]);
        importProcess.beginImport(file);
        Period period = new Period(System.currentTimeMillis() - start); // in milliseconds
        System.out.println(formatter.print(period.toPeriod()));
    }

I have decided to remove JUnit and just use a main() method. The result are exactly the same. IntelliJ is again. Here is the crazy log.

With IntelliJ

DEBUG [ main] 2011-08-18 13:05:16,259 [er.DelimitedTextUnixDataSorter] Sorting file [/Users/amirraminfar/Desktop/import-process/usage]
DEBUG [ main] 2011-08-18 13:06:09,546 [er.DelimitedTextUnixDataSorter] Sorting file [/Users/amirraminfar/Desktop/import-process/customer]

With java -jar

DEBUG [ main] 2011-08-18 12:10:16,726 [er.DelimitedTextUnixDataSorter] Sorting file [/Users/amirraminfar/Desktop/import-process/usage]
DEBUG [ main] 2011-08-18 12:15:55,893 [er.DelimitedTextUnixDataSorter] Sorting file [/Users/amirraminfar/Desktop/import-process/customer]

The sort command is

sort -t'    ' -f -k32,32f -k18,18f -k1,1n

As you can see above, sorting in Intellij take 1 minutes but in java -jar takes 5 minutes!

Update

I ran everything using /Library/Java/JavaVirtualMachines/1.6.0_26-b03-383.jdk/Contents/Home/bin/java and the sorting still takes well over 5+ mins.

Amir Raminfar
  • 33,777
  • 7
  • 93
  • 123
  • which Java installation do the command-line ``java`` and ``javac`` point to? – michel-slm Aug 18 '11 at 16:54
  • It's not running faster under IntelliJ, it's running slower under Maven! Can you try running it as a raw `java` invocation? – Tom Anderson Aug 18 '11 at 17:01
  • @hircus /usr/bin/java -> /System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/java and current is symlink to A. I have no idea what A is – Amir Raminfar Aug 18 '11 at 17:02
  • @Tom, I have also done that. I created an executable JAR file via maven dep plugin and it took the same amount of time. I do wonder if hircus has a point about it being a different java? – Amir Raminfar Aug 18 '11 at 17:03
  • How much memory is available to IntelliJ and maven respectively? – Thorbjørn Ravn Andersen Aug 18 '11 at 17:06
  • My timer is around the import and not around the compiling and starting java. – Amir Raminfar Aug 18 '11 at 17:08
  • I always told that those guys at JetBrains are doing magic :) Try running your tests with the same command line that IDEA does. And what happens when your run maven tasks from IDEA? – Denis Tulskiy Aug 18 '11 at 17:09
  • From the command line just do:"java -version". You'll know which Java you get from "A". – toto2 Aug 18 '11 at 17:10
  • @toto2 I am going to try that next. But I am confused because this has nothing to do with java. I am using /bin/sort to sort. Why does calling in IntelliJ matter? – Amir Raminfar Aug 18 '11 at 17:13
  • A newer Java could be more efficient at dealing with system commands. By the way how often do you call /bin/sort? – toto2 Aug 18 '11 at 17:16
  • @toto2 I create 4 files and sort each one. So a total of 4 times. But as you can see above, the sorting is clearly faster. – Amir Raminfar Aug 18 '11 at 17:18
  • 2
    If /bin/sort outputs massively to the System.in (or however Java deals with the output of /bin/sort), some different Java could perform differently. You can also do `/bin/sort input -o outfile`. – toto2 Aug 18 '11 at 17:22
  • Here is the full command `/bin/sh -c sed '1d' /Users/amirraminfar/Desktop/import-process/account | sort -t' ' -f -k4,4f -k10,10f -k1,1n >> /Users/amirraminfar/Desktop/import-process/account.sorting` @toto2 you think that matters? – Amir Raminfar Aug 18 '11 at 17:27
  • I used `/Library/Java/JavaVirtualMachines/1.6.0_26-b03-383.jdk/Contents/Home/bin/java` and still see the slower times. – Amir Raminfar Aug 18 '11 at 17:31
  • Both versions are `java version "1.6.0_26"` so no they are not different java versions. – Amir Raminfar Aug 18 '11 at 17:35
  • I don't know. You could go in IntelliJ and see what flags are given to `java` when you run your code. But maybe some flags are hidden, or given in some hard to reach menu option. – toto2 Aug 18 '11 at 17:40
  • @toto I think the only difference is `com.intellij.rt.execution.application.AppMain` that intellij uses. I can't think of anything else. – Amir Raminfar Aug 18 '11 at 17:41
  • If you run the `java -jar` version twice in a row, does it take the same amount of time for both runs? – Alex B Aug 19 '11 at 18:17
  • @Alex yes it does. Again this isn't a java problem rather the way java starts a new process. – Amir Raminfar Aug 19 '11 at 19:04
  • All the answer was `LANG=C` :( This took far too long to find out. – Amir Raminfar Aug 22 '11 at 20:23

6 Answers6

11

Thank you everybody for helping. It turns out IntelliJ starts sort with LANG=C. Mac OS X terminal sorts by default in UTF8 which explains the performance loss. Hopefully this answer will help somebody.

Amir Raminfar
  • 33,777
  • 7
  • 93
  • 123
1

Is mvn clean doing a rebuild of the project? Is the run under IDEA not doing that? Does building the project with Maven take 18 minutes (I wouldn't be surprised if it did, given that Maven is the absolute pits)?

If the answers to all these questions are 'yes', then i think you have a conclusion.

The solution is to take Maven to the woods, shoot it, then bury it in an unmarked grave.

Tom Anderson
  • 46,189
  • 17
  • 92
  • 133
  • I am wrapping the timer around the actually import and not how long the process takes. I will update my code. – Amir Raminfar Aug 18 '11 at 17:03
  • That seems likely. Try ``mvn clean compile`` and then time a run of ``mvn test`` afterwards – michel-slm Aug 18 '11 at 17:04
  • I pasted my code. I am not compiling anything. All the files have been pre-compiled in a jar file. The timer is around the import and not around the start of java. – Amir Raminfar Aug 18 '11 at 17:07
  • Really the only difference I see is `com.intellij.rt.execution.application.AppMain` which is a wrapper around Intellij runner. – Amir Raminfar Aug 18 '11 at 17:40
1

A guess more than substantiated answer:

A lot may depend on I/O buffering. Sort over 500K records is going to output a lot of data, so the right buffer size may matter a lot. I think the tty is typically line buffered, so it is going to do 500K read & write ops, and the IDE may simply read in much larger buffers.

Additionally, it is possible that OSX has process or I/O scheduling which heavily favours GUI apps over console ones (which could be detected through being bound to a tty), so it might be that you have to wait & idle a lot more time from the console than from within the IDE.

user268396
  • 11,576
  • 2
  • 31
  • 26
0

This question's not exactly properly defined and far from clear. Are you saying you read out a list of files in a folder using Java IO and then pass it to external process to sort them? Sounds as a bit strange solution to me, but anyway, this has to be memory related. Connect to both apps using JConsole and look for clues in charts.

MarianP
  • 2,719
  • 18
  • 17
  • Have already connected to Jconsole and they are both the same. This is not a problem with java but rather the `bin/sort` called by java. – Amir Raminfar Aug 19 '11 at 17:10
  • From what I understand about your problem, I am 99% sure the problem is a memory problem either during reading out the list of files (not sure if your rejected this hypothesis yet) or, and more likely, a memory problem when passing the list of files to the external process. – MarianP Aug 19 '11 at 19:32
0

mvn is probably launching a JVM with different options than your IDE.

Real differences between "java -server" and "java -client"? can cause substantial differences in performance for long running applications, but sometimes differences in gc flags can cause performance differences when your app uses memory in different ways.

If your process is memory bound and so thrashes at some memory level, heap flags like -Xmx can drastically affect performance too. Memory profiling can easily rule this in or out.

To diagnose these differences, look at your mvn configuration files to figure out how it is launching a JVM, and look at your IDE java app launcher configuration.

Community
  • 1
  • 1
Mike Samuel
  • 118,113
  • 30
  • 216
  • 245
  • If you look at my edit, I stopped using mvn and started using just a simple compiled jar file with a main class. Still same affect. – Amir Raminfar Aug 19 '11 at 17:11
0

Have you tried using a profiler such as VisualVM to see what is is the bottleneck? Also compare the CPU usage graphs of your computer (Mac should have some system monitor). Maybe the process is blocking at some point and not working effectively, which could be seen as a CPU usage graph of different shape.

Esko Luontola
  • 73,184
  • 17
  • 117
  • 128