2

I am running an experiment on varying thread count for java 8 parallel stream's reduction operation: summation over 11.5 million numbers. I run it on 8 core intel xeon processor. I do not see much variation in the total execution time and total CPU time, when I vary the number of threads from 16 to 1. I do get speed-up over sequential version which uses 'streams' instead of `parallel streams'. I want to understand the correlation between no of threads and the speed-up over sequential reduction.

Can somebody help me with the reasoning for this please? Is my code incorrect somewhere?

Source code is here, line 74 performs parallel reduction.

Relevant part of the code is pasted below as well:

class RedOperator implements BinaryOperator<MonResult>{

    @Override
    public MonResult apply(MonResult t, MonResult u) {
        // TODO Auto-generated method stub
        if(t != null && u != null)
            t.count = t.count + u.count;
        return t;
    }

}

class FOFinisher implements Function<ConcurrentHashMap<ArrayList<String>, ArrayList<MonResult>>, ArrayList<MonResult>>{


    protected MonResult id;
    protected boolean isParallel;
    public FOFinisher(boolean isparallel){
        id = new MonResult();
        id.count = 0;
        this.isParallel = isparallel;
    }
    public long getJVMCpuTime() {
        long lastProcessCpuTime = 0;
        try {
            if (ManagementFactory.getOperatingSystemMXBean() instanceof OperatingSystemMXBean) {
                lastProcessCpuTime=((com.sun.management.OperatingSystemMXBean)ManagementFactory.getOperatingSystemMXBean()).getProcessCpuTime();
            }
        }
        catch (  ClassCastException e) {
            System.out.println(e.getMessage());
        }finally{
            return lastProcessCpuTime;
        }
    }
    @Override
    public ArrayList<MonResult> apply(ConcurrentHashMap<ArrayList<String>, ArrayList<MonResult>> t) {
        // TODO Auto-generated method stub

        long beg = System.nanoTime();
        long begCPU = getJVMCpuTime();
        ArrayList<MonResult> ret;
        RedOperator x = new RedOperator();
        if(isParallel){
            ret = new ArrayList<MonResult>(t.values().stream().map((alist)->{
                        return alist.parallelStream().reduce(id,x);
                    }).collect(Collectors.toList()));
        }
        else{
            ret = new ArrayList<MonResult>(t.values().stream().map((alist)->{
                        return alist.stream().reduce(id,x);
                    }).collect(Collectors.toList()));
        }

        long end = System.nanoTime();
        long endCPU = getJVMCpuTime();
        System.out.println("Exec Time:" + TimeUnit.MILLISECONDS.convert((end-beg),TimeUnit.NANOSECONDS));
        System.out.println("CPU Time : " + TimeUnit.MILLISECONDS.convert((endCPU-begCPU),TimeUnit.NANOSECONDS));
        return ret;
    }

}
ntalbs
  • 28,700
  • 8
  • 66
  • 83
Yogi Joshi
  • 786
  • 1
  • 6
  • 19
  • 2
    Please post pertinent code in the question, do not link. Links die, so the question loses all context. – Andy Turner Aug 10 '15 at 06:01
  • 1
    Relevant: http://stackoverflow.com/q/504103/3788176 – Andy Turner Aug 10 '15 at 06:03
  • 1
    If you managed to produce a self-contained example to post here, which I could run to confirm your finding, then I (and others) would be able to help you. – Marko Topolnik Aug 10 '15 at 13:38
  • 1
    Have you tried to parallelize the outer stream (`t.values().parallelStream()`) instead? Also it would be more efficient to use `ret = t.values().<...>.collect(Collectors.toCollection(ArrayList::new));` instead of `new ArrayList<>(t.values()...collect(Collectors.toList())`. – Tagir Valeev Aug 10 '15 at 15:25

1 Answers1

2

You do not see any speed-up comparing to sequential version because most of time is spent on splitting and joining tasks. In shortcut - the problem which you took under test is too simple. Please have a look at Angelika Langer's Geecon presentation for more details.

k0ner
  • 1,086
  • 7
  • 20