1

I'm building a test harness in java, and trying to compare the performance and latency of two parsers. The parses munge data coming off of a live, single feed. I have no control over the feed, nor do I have a "simulated feed" for mocking data, so to compare apples to apples, I'd like to run my parses as concurrently as possible. I'm new to java and threading, so am not sure if this is the best approach. My idea was to spin 2 threads:

SomeFeed feed = new SomeFeed();

Thread thread1 = new Thread () {
  public void run () {
    parser1.parseFeed(feed);
  }
};
Thread thread2 = new Thread () {
  public void run () {
    parse2.parseFeed(feed);
  }
};
thread1.start();
thread2.start();

Will threads run this way operate roughly synchronously? Or is there a better approach?

Thanks

Adam Hughes
  • 14,601
  • 12
  • 83
  • 122
  • If you can run two parsers from the same feed concurrently, you could also simply run them sequentially. Running concurrently introduces a pandoras box of uncertanities. But I suspect neither will work because the feed will not support it. – Durandal Feb 03 '16 at 20:59
  • We're hesitant to run them sequentially because the feed's output can change dramatically and unpredictably. – Adam Hughes Feb 04 '16 at 13:32
  • This is solving the wrong problem. You should ask a question about how you could change your design to provide a way to mock data (there are several well established and simple method for doing so) instead of trying to solve a next to impossible problem. Not only will this greatly improve the reliability of your performance tests it will also allow you to test your parsers which is always a good thing. – Voo Feb 04 '16 at 13:55
  • Also you really shouldn't implement your own benchmark methods when there's [JMH](http://openjdk.java.net/projects/code-tools/jmh/). There are just way too many pitfalls when benchmarking Java and while they shouldn't have a big influence in this particular case it's a good idea to use standard tools in any case. – Voo Feb 04 '16 at 13:58

3 Answers3

2

Having two threads run exactly in parallel isn't something you can really control. But if you care about starting them at the same time (almost) you can use CyclicBarrier (taken from here):

// We want to start just 2 threads at the same time, but let's control that 
// timing from the main thread. That's why we have 3 "parties" instead of 2.
final CyclicBarrier gate = new CyclicBarrier(3);

Thread t1 = new Thread(){
    public void run(){
        gate.await();
        //do stuff    
    }};
Thread t2 = new Thread(){
    public void run(){
        gate.await();
        //do stuff    
    }};

t1.start();
t2.start();

// At this point, t1 and t2 are blocking on the gate. 
// Since we gave "3" as the argument, gate is not opened yet.
// Now if we block on the gate from the main thread, it will open
// and all threads will start to do stuff!

gate.await();
System.out.println("all threads started");

This will get you the closest to starting them at the same time.

Community
  • 1
  • 1
Idos
  • 15,053
  • 14
  • 60
  • 75
  • Cool, that certainly helps. Any idea of how "close" to parallel threads in java typically run? Does it depend on, for example, my CPU architecture? – Adam Hughes Feb 03 '16 at 18:16
  • It depends on the # of cores. If you have 1 then you can pretty much *bank* on it that it won't run well. But with more than 1 it will be pretty reasonably parallel. – Idos Feb 03 '16 at 18:17
  • Thank you. Also, let's say the operation run() finishes on t1, but is still running on t2. Is there a way to know when both have finished? And moreso, to rerun the operation? I want to measure do this maybe 20 or so times. CyclicBarrier sounds like it run in a loop. Although, I guess I could just make a for loop as well. – Adam Hughes Feb 03 '16 at 18:19
  • 1
    The documentation (linked in the answer) has some nice features like resetting the barrier and getParties. But in order to check if both finished you can use the .getState()/isAlive() on the threads themselves (in a while loop probably). Running the threads multiple times sounds to me like a good job for a `for` like you said :) – Idos Feb 03 '16 at 18:22
2

That is one way of doing things. Other way is implementing a Runnable interface

public class SomeFeed implements Runnable {

    public void run() {
        System.out.println("Hello from a thread!");
    }

    public static void main(String args[]) {
        (new Thread(new SomeFeed())).start();
    }

}

The new Approach is to use ThreadPool.

This is the way you could create a pool and execute your code

ExecutorService pool = Executors.newFixedThreadPool(2);

for(int i = 0; i < 2; i++){
   pool.submit(new SomeFeed());


}

Make Sure SomeFeed Implements Callable Interface.

More information can be found here

allthenutsandbolts
  • 1,513
  • 1
  • 13
  • 34
0

Regardless of you single feed problem, running the parsers concurrently is about the worst way to compare them.

Save the feed aside to become your reference data set for both tests.

You need to measure the time they took at least, while the CPU was rougly available with no interference.

And you should also do more than 1500 runs to have a fair measure of the avg time the routine takes (1500 method invocations is a threshold for hotspot JIT compilation that speeds up the code). Also, try to get the code running over 30 seconds at least, to have potential OS and disk variations interference be averaged too.

And if you notice a GC pause (always enable a gc log during benchmarks), then you have to run the test either in more memory to avoid a full gc (use -Xmx2G for example), or run the test so often that the number of full GC is fairly equal in both parsers.

Anyhow, the memory abuses and thus the GC time is also a factor of performance to judge the worst parser.

user2023577
  • 1,752
  • 1
  • 12
  • 23
  • Thanks for the answer. Unfortunately, it's not just simply data from a feed, there's a ton going on on the backend that would take me a huge amount of time to build into a mock feed. I wonder if I run the programs long enoguh, not simultaneously, if I can count on the law of averages to give a better idea of performance – Adam Hughes Feb 03 '16 at 20:27
  • PS, why is running concurrently bad? I'm moreso worried about the different in parser processing time, not the absolute processing time. So wouldn't running concurrently at least let me compare performance in similar conditions (even though these conditions are reflecting my real runtime conditions)? – Adam Hughes Feb 03 '16 at 20:36
  • 1
    simply because 1 parser suck up CPU, memory (perhaps even nw and i/o) which is unfair to the other parser, vice-versa... no? You have no garanty that there is 2 decicated CPU, 2 dedicated disks, 2 dedicated 'everything' to establish a fair test. The 1st parser will slow down your 2nd parser and vice versa, but the OS might grant more time and i/o to the heavy parser, making it look too good, and the light parser look bad. OS are too smart and definitely not linear. – user2023577 Feb 03 '16 at 20:58