IO Performance between Go and Java

Question

I had a simple performance test between go(1.11) and java(1.8) on my Mac(version Majave) with 4Cpus/i5 and 16G memory, I found that, reading a small file, golang is 6~7 times faster than java. Below is my test code, I want to confirm whether my test code wrong or I missed something?

Java

with concurrent.ExecutorService

import java.io.*;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;



class TaskWithResult implements Callable<String> {
        private static String readToString() {
        String fileName = "/Users/pis/IdeaProjects/Test/src/data/test.txt";
        File file = new File(fileName);
        Long filelength = file.length();
        byte[] filecontent = new byte[filelength.intValue()];
        try {
            FileInputStream in = new FileInputStream(file);
            in.read(filecontent);
            in.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        SimpleDateFormat myFmt=new SimpleDateFormat("yyyy-MM-dd HH: mm: ss: SSS: ");
        Date d1 = new Date();
        return myFmt.format(d1);
    }

    /**
     * 任务的具体过程，一旦任务传给ExecutorService的submit方法，
     * 则该方法自动在一个线程上执行
     */
    public String call() throws Exception {
        String result = readToString();
        System.out.println(result);
        //该返回结果将被Future的get方法得到
        return result;
    }
}


public class readFile{
    public static void main(String args[]){
        ExecutorService es = Executors.newFixedThreadPool(5);
        List<Future<String>> resultList = new ArrayList<Future<String>>();
        SimpleDateFormat myFmt=new SimpleDateFormat("yyyy-MM-dd HH: mm: ss: SSS");
        Date d1 = new Date();
        System.out.println("Start Time:"+myFmt.format(d1));
        for (int i = 0; i < 1000; i++){
            //使用ExecutorService执行Callable类型的任务，并将结果保存在future变量中
            Future<String> future = es.submit(new TaskWithResult());
            //将任务执行结果存储到List中
            resultList.add(future);
        }
    }

}

Go

with channel

package main

import (
    "fmt"
    "io/ioutil"
    "time"
)

func readFile(fileName string, p chan string)chan string {
    f, err := ioutil.ReadFile(fileName)
    if err!=nil{
        fmt.Println("read file error")
    }
    p<-string(f)
    return p
}

func main() {
    le := 1000
    p := make(chan string, le)
    start := time.Now()
    for i:=0;i<le;i++{
        go readFile("test.txt", p)
    }
    fmt.Println(fmt.Sprintf("Start Time: %s", start))
    for i:=0;i<le;i++{
        <-p
        fmt.Println(fmt.Sprintf("End Time: %s, duration: %f", time.Now(), time.Since(start).Seconds()))
    }

}

Result
- Go：complete all task in about 0.0197s
  
  Start Time: 2018-12-24 15:30:50.333694 +0800 CST m=+0.000325519
  
  ...
  
  End Time
  
  ...
  
  End Time: 2018-12-24 15:30:50.353409 +0800 CST m=+0.020040254, duration: 0.019715
- Java: complete all task in about 122ms
  
  Start Time:2018-12-24 15: 30: 31: 301
  
  ...
  
  2018-12-24 15: 30: 31: 422

My test data file is a very simple txt in several lines(about 362B). Is there something wrong with my test code in test reading a small file between go and java? Somebody pls help me out. Thanks in advance:)

Yes there's a problem. The problem is that the main thing you're testing is the startup overhead for each runtime. — Jonathan Hall, Dec 24 '18 at 08:50
@Flimzy So the result shows golang runtime's start up for goroutine is faster than java? — Lau Real, Dec 24 '18 at 09:43
Well, what a surprise... Native code vs byte code... Guess who wins ***always*** on the first 10 yards... — Markus W Mahlberg, Dec 24 '18 at 09:44
"start up for goroutine"? What does that mean? All I'm saying is what is already obvious: Go has lower startup overhead than Java, and this is what your benchmark shows. — Jonathan Hall, Dec 24 '18 at 09:48
https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java . look at that for the java. For the Go, use the testing package, see https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go — Vorsprung, Dec 24 '18 at 09:50
basically (sorry) your approach is so flawed it's difficult to list all the problems — Vorsprung, Dec 24 '18 at 09:51
@Vorsprung Thanks very much for ur sharing, I'll do a further research with ur links. — Lau Real, Dec 25 '18 at 01:49

score 2 · Accepted Answer · answered Dec 24 '18 at 10:14

I see several problems with that, both from a conceptual point of view as well as a technical.

You use a channel to return your result set (good, kind of), but then, you simply throw the result away. Furthermore, you are using an unbuffered channel, so you have a choking point there. Note that this is not a problem per sé, since pipelines are a great way of structuring your program - you simply used it in a wrong way here, imho. Something in the line of

package main

import (
    "fmt"
    "sync"
    "time"
)

func main() {
    le := 1000

    // We want to wait until the operations finish
    var wg sync.WaitGroup

    // We "prealloc" err, since we do not want le * allocations
    var err error


    start := time.Now()
    for i := 0; i < le; i++ {

        // Add an operation to wait for to the group
        wg.Add(1)
        go func() {
            // Ensure the WaitGroup is notified we are done (bar a panic)
            defer wg.Done()

            // Short notation, since we are not interested in the result set
            if _,err = ioutil.ReadFile(fileName);err!=nil{
             fmt.Println("read file error")
            }

        }()

    }

    // Wait until all operations are finished.
    wg.Wait()
    fmt.Printf("%d iterations took %s", le, time.Since(start))
}

would be my solution. If I had the idea to do something like this.

But if we have a deep look into the code, basically the only working component here is ioutil.ReadFile. Using this for program parts which are worth benchmarking is a Very Bad Idea™ in the first place. It should be used for rather small files (like config files, for example) - which in itself is rather not a part of your program you want to benchmark.

What you do want to benchmark is the processing logic of the files you just read. Let me give you an example: Say you want to read in a bunch of small JSON files, unmarshal them, modify them, marshal them again and send them to a REST API. So, what part of your program would you want to benchmark in this case? My bet goes onto the logic processing the files. Because that is the part of the program you can actually optimize. You can neither optimize ioutil.ReadFile nor the server. Unless you happen to write this one, too. In which case you would want to benchmark the server logic from within the server package.

Last, but not least, your question is titled "IO Performance between Go and Java". To actually measure IO performance, you would need very large IO operations. I tend to use ISO images for this - real world data I tend to have laying around.

Thanks very much for the very detail answer! Now I have a better understanding on performance scenario. — Lau Real, Dec 25 '18 at 01:46
@LauReal, «To actually measure IO performance, you would need very large IO operations.»—and this mostly measures the speed at which the page cache→filesystem→io scheduler→drive perform _sequential_ (on different levels) I/O. Conversely, processing lots of small files—especially if they are located in different directories—would tilt this picture quite noticeably as you'd then measure different aspects of the various blocks in the chain above. — kostix, Dec 25 '18 at 16:01
@LauReal, All in all, I'd recommend reading a good background book on how a (typical) OS is implemented and then proceed with learning more low-level details about the OS you're targeting. And it worth reiterating that benchmarks like yours—unless you manage to build some gross bloopers in them, like woefully ineffective memory management,—are going to be dominated by the time spent in the system calls into the OS, which actually performs the I/O in the commodity OSes of the present day. — kostix, Dec 25 '18 at 16:03
@kostix I kind of disagree. That would be based on the assumption that files are actually written sequentially. On a different level, one might actually want to know how a certain program behanves on different file- and/or operating systems. So while in fact one measures a lot of OS implementation details using benchmarks, this might well be the actual idea. — Markus W Mahlberg, Dec 25 '18 at 16:06

IO Performance between Go and Java

1 Answers1