0

I was trying to add 10 Million records in an array list using Thread pool of size 4(On octa core processor). But it is taking double the time compared to single threaded code.

Below are the code snippets. I might be doing something wrong. Can anyone explain what is the issue in code?

package com.shree.test;

public class Student {
    private int id;
    private String name;
    private int age;
    private int std;

    public Student(int id, String name, int age, int std) {
        super();
        this.id = id;
        this.name = name;
        this.age = age;
        this.std = std;
    }
    public int getId() {
        return id;
    }
    public void setId(int id) {
        this.id = id;
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public int getAge() {
        return age;
    }
    public void setAge(int age) {
        this.age = age;
    }
    public int getStd() {
        return std;
    }
    public void setStd(int std) {
        this.std = std;
    }
    @Override
    public String toString() {
        return "Student [id=" + id + ", name=" + name + ", age=" + age + ", std=" + std + "]";
    }
}

Multi Threaded code(with Threadpool):

    package com.shree.test;

    import java.time.LocalDateTime;
    import java.util.ArrayList;
    import java.util.Collections;
    import java.util.List;
    import java.util.concurrent.Callable;
    import java.util.concurrent.ExecutorService;
    import java.util.concurrent.Executors;

    import org.apache.commons.lang3.RandomStringUtils;
    import org.apache.commons.lang3.RandomUtils;

    class Task implements Callable<Student>{

        private static final String CHAR_SET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
        private int id;
        private List<Student> studentList;

        public Task(int id,List<Student> studentList) {
            this.id = id;
            this.studentList = studentList;
        }

        @Override
        public Student call() throws Exception {
            Student student =  new Student(id, RandomStringUtils.random(RandomUtils.nextInt(5, 10), CHAR_SET), RandomUtils.nextInt(10, 15), RandomUtils.nextInt(4, 9));
            studentList.add(student);
            return student;
        }
    }

    public class MultiThreadStudentListGenerator {

        private List<Student> students = Collections.synchronizedList(new ArrayList<>());

        private ExecutorService threadPool = Executors.newFixedThreadPool(4);

        public void generateStudentList() {
            for(int i=0;i<10000000;i++) {
                threadPool.submit(new Task(i, students));
            }
            threadPool.shutdown();  
        }

        public void process() {
            generateStudentList();
        }

        public int getSize() {
            return students.size();
        }

        public void addShutDownhook(LocalDateTime dateTime1 ) {
            Runtime.getRuntime().addShutdownHook(new Thread() {
                public void run() {
                    LocalDateTime dateTime2 = LocalDateTime.now();

                    long diffInMilli = java.time.Duration.between(dateTime1, dateTime2)
                            .toMillis();

                    System.out.println("Time taken in Miliseconds: " + diffInMilli);
                    System.out.println("List Size: " + getSize());
                }
            });
        }

        public static void main(String[] args) {

            MultiThreadStudentListGenerator multiThreadStudentListGenerator = new MultiThreadStudentListGenerator();

            LocalDateTime dateTime1 = LocalDateTime.now();
            multiThreadStudentListGenerator.addShutDownhook(dateTime1);

            multiThreadStudentListGenerator.process();
        }
    }

Single Threaded code:

package com.shree.test;

import java.time.LocalDateTime;
import java.util.ArrayList;
import java.util.List;

import org.apache.commons.lang3.RandomStringUtils;
import org.apache.commons.lang3.RandomUtils;

public class SingleThreadStudentListGenerator {

    private static final String CHAR_SET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

    private List<Student> students = new ArrayList<>();

    public void generateStudentList() {
        for (int i = 0; i < 10000000; i++) {
            Student student = new Student(i, RandomStringUtils.random(RandomUtils.nextInt(5, 10), CHAR_SET),
                    RandomUtils.nextInt(10, 15), RandomUtils.nextInt(4, 9));
            students.add(student);
        }
    }

    public void process() {
        generateStudentList();
    }

    public int getSize() {
        return students.size();
    }

    public void addShutDownhook(LocalDateTime dateTime1) {
        Runtime.getRuntime().addShutdownHook(new Thread() {
            public void run() {
                LocalDateTime dateTime2 = LocalDateTime.now();

                long diffInMilli = java.time.Duration.between(dateTime1, dateTime2).toMillis();

                System.out.println("Time taken in Miliseconds: " + diffInMilli);
                System.out.println("Size: " + getSize());
            }
        });
    }

    public static void main(String[] args) {

        SingleThreadStudentListGenerator mainClass = new SingleThreadStudentListGenerator();

        LocalDateTime dateTime1 = LocalDateTime.now();
        mainClass.addShutDownhook(dateTime1);

        mainClass.process();

    }
}
Shrirang Kumbhar
  • 363
  • 4
  • 17

2 Answers2

1

Two main problems:

  • how you measure. Your idea to use the shutdown hook, that is really weird. You should rather use a framework like JMH to do such testing, see here for guidance how to get to meaningful numbers
  • then, here: Collections.synchronizedList(new ArrayList<>()). That creates a list that synchronizes on add/remove requests. That means: locking. Guess what: locking is expensive.

In other words: A) the way you acquire your numbers is dubious and B) locking is more expensive than not locking. In your case, your 4 threads will constantly run into each other, and will have to wait for another thread to be done modifying the list.

Just imagine: when you have one shovel, and you need to dig a hole ... would you really benefit from hiring 4 guys to do the work? Or would it rather be: 3 guys watching the 4th using the shovel?!

As in: note that using multiple threads only saves you overall execution time in specific situations, for example when your workload needs to wait for I/O very often. A CPU intensive workload (and that is what your code is doing) isn't profiting from "more threads". To the contrary. More threads, that can mean context switches, locking, less efficient usage of CPU caches, and so on.

Thus: if you really want to see improvements from using multiple threads, start with workloads that would definitely benefit from multiple threads. For example: when you connect to X websites, and download content ... then you will greatly benefit from doing that with multiple threads.

GhostCat
  • 137,827
  • 25
  • 176
  • 248
  • (a). Using shutdown hook might not be good Idea, I was doing that for litmus testing (b). I agree that synchronized list can hamper performance. But the "Task" takes some time(here Object creation time is time consuming) before adding new object to the list. I have only 4 threads, if one is creating object other might be adding to list and another might be calculating random number. So, I think using synchronized list here should not hamper performance much. – Shrirang Kumbhar Oct 23 '19 at 07:56
  • (c). Yes, my operations are CPU intensive, but I have 8 CPU cores that can run 8 threads in parallel, I am just using 4 cores(of course there are other apps that might be using other cores). If I run single thread the operations should be slow, but it is not. (d). So, is it not good idea to use threads for CPU intensive operations? – Shrirang Kumbhar Oct 23 '19 at 07:57
  • In the end all your threads want to push data into the same data structure. As said: 4 guys, but only one shovel. Sure, when the CPU intensive work results are independent of each other, then you want to to have as many threads as your CPU supports to run "real parallel". – GhostCat Oct 23 '19 at 08:28
0

With a syncronized list, your muti-threaded approach performing poorly is not surprising. Your main thread still has to create tasks and the task threads still have to wait for list access.

Instead consider splitting up list creation into 4 parallel jobs, and combine them to build the 10 million list of students.

Martin'sRun
  • 522
  • 3
  • 11
  • True, but this comes at the cost of **merging** the list. Which means again copying things together. I really think: creating a single list isn't something that can benefit much from multi threading. – GhostCat Oct 22 '19 at 18:20
  • Agree with @GhostCat, I thought of this before but, it involves coping of lists, which is also costly operation. – Shrirang Kumbhar Oct 23 '19 at 05:16