0

I wrote a small class that basically retrieves data from a CSV file and loads it into a POJO object. Since I need frequent access to this data I wrote a singleton class that checks if the data is already in the object and if yes, it simply returns the data directly from object (without needing to get it from file again). Otherwise, it retrieves data from file and stores data in object for future queries.

When testing I noticed it takes roughly 175 milliseconds to access the data 10,000 times (inclduing the first time which loads the data from file).

What struck me was that when I looped 20,000 times it took only 177 milliseconds (only two milliseconds more than 10,000 times) and 50,000 times took only about 197 milliseconds.

What is the explanation that it is so much quicker to do 50K vs. 10K? Why doesn't the time increase proportionally?

Also why is accessing data directly from object so much faster than accessing it from disk (when I access it via file it takes about 160 milliseconds for a single time)

Thanks

Update:

Perhaps even more perplexing is that when I attempt to access the object using two different keys (which requires two reads from file) it takes roughly the exact same amount of time (with a 1 millisecond variation) than accessing it once. All of the explanations regarding object access being 200K times faster than file access explains only my first observation but now I'm actually reading data from two different files yet I don't see a proportional increase in the amount of time it takes.

In other words, doing this:

    for (int counter = 0; counter < 1; counter++) {

        POJOObj.getInstance().getKey("Key1", "Val1");

    }

takes the same amount of time as doing this:

for (int counter = 0; counter < 1; counter++) {

        POJOObj.getInstance().getKey("Key1", "Val1");
        POJOObj.getInstance().getKey("Key1", "Val2"); // this requires new read from file

    }

Why does the time not increase proportionally?

S.O.S
  • 848
  • 10
  • 30
  • Accessing from a file requires disk I/O. Accessing from the "object" is straight from memory. See: https://stackoverflow.com/questions/1371400/how-much-faster-is-the-memory-usually-than-the-disk You should also read up on basics of how caching works, which explains why it doesn't "increase proportionally" as you say. – noahnu Aug 29 '18 at 01:32
  • 1
    *"Why does the time not increase proportionally?"* Likely because your performance-testing logic is flawed. See [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/q/504103/5221149) – Andreas Aug 29 '18 at 02:56

1 Answers1

2

Reading a file from disk is much slower compared to reading the data from memory. There is a pretty good resource called "Latency Numbers Every Developer Should Know" that explains part of this. Essentially, reading 1MB from disk is about 200,000x slower than reading the same from main memory.

As for why you see quicker response times from your method - Hotspot (the JVM's internal compiler) has probably kicked in. When you execute a method frequently in Java, the JVM will detect this after some threshold (I want to say it's around 10k invocations, but don't trust me on that) and optimize the method. It does this by converting the interpreted bytecode you had been executing to inlined assembly. This is much faster, and happens behind the scenes. Writing microbenchmarks such as you have is exceptionally difficult and there are a lot of ways to mess that up. Check out this resource from Oracle on some of the pitfalls and how to avoid them with a tool called JMH, if you are interested in exploring these numbers further.

Todd
  • 30,472
  • 11
  • 81
  • 89
  • *reading 1MB from disk is about 200,000x slower than reading the same from main memory* And that's ideal, probably for something like a *fast* SSD. Screw your IO up badly enough on a slow SATA drive that can only do about 40 or 50 head seeks/second and reading a MB can take up to 30 or 40 seconds. – Andrew Henle Aug 29 '18 at 10:35