2

My program has an arraylist of websites which I do I/O with image processing, scrape data from sites and update/insert into database. Right now it is slow because all of the I/O being done. I would like to speed this up by allowing my program to run with threads. Nothing is ever removed from the list and every website in the list is separate from each other so to me it seems okay to have instances looping through the list at the same time to speed this up.

Let's say my list is 10 websites, right now of course it's looping through position 0 through 9 until my program is done processing for all websites.

And let's say I want to have 3 threads looping through this list of 10 websites at once doing all the I/O and database updates in their own separate space at the same time but using the same list.

website.get(0) // thread1
website.get(1) // thread2
website.get(2) // thread3

Then say if thread2 reaches the end of the loop it first it comes back and works on the next position

website.get(3) // thread2

Then thread3 completes and gets the next position

website.get(4) // thread3

and then thread1 finally completes and works on the next position

website.get(5) // thread1

etc until it's done. Is this easy to set up? Is there somewhere I can find a good example of it being done? I've looked online to try to find somewhere else talking about my scenario but I haven't found it.

spongebob
  • 8,370
  • 15
  • 50
  • 83
Jay
  • 91
  • 9
  • 3
    Use a fixed thread pool of size 3. You do know though that your total processing speed may not be changed by threading. – Hovercraft Full Of Eels Apr 10 '15 at 16:17
  • The total processing speed can be _significantly_ faster if using multithreading, especially if networking is involved. Imagine if your webrowser had to download every single image on after the other - Loading a page would take ages. – Jonas Czech Apr 10 '15 at 16:24

3 Answers3

2

In my app, I use ExecutorService like this, and it works well:

Main code:

ExecutorService pool = Executors.newFixedThreadPool(3); //number of concurrent threads

for (String name : website) { //Your ArrayList
    pool.submit(new DownloadTask(name, toPath));
}

pool.shutdown();
pool.awaitTermination(5, TimeUnit.SECONDS); //Wait for all the threads to finish, adjust as needed.

The actual class where you do the work:

private static class DownloadTask implements Runnable {

    private String name;
    private final String toPath;

    public DownloadTask(String name, String toPath) {
        this.name = name;
        this.toPath = toPath;
    }

    @Override
    public void run() {
        //Do your parsing / downloading / etc. here.

    }
}

Some cautions:

  • If you are using a database, you have to ensure that you don't have two threads writing to that database at the same time.

See here for more info.

Community
  • 1
  • 1
Jonas Czech
  • 12,018
  • 6
  • 44
  • 65
  • Thanks for the code. I will try this and then update this comment if it works – Jay Apr 10 '15 at 16:38
  • This seems to be working. But I'm not sure yet about database updates/inserts because no new info was found yet. I'm using Spring for transaction but I don't know if that helps or not. The scraping is running in parallel though! Thanks – Jay Apr 10 '15 at 17:19
  • @Jay I don't have a clue about Spring, so can't help you there. But it should be fairly easy. If you have any further questions, don't hesitate to ping me ! – Jonas Czech Apr 10 '15 at 17:25
0

As mentioned in other comments/answer you just need a thread pool executor with fixed size (say 3 as per your example) which runs 3 threads which iterate over the same list without picking up duplicate websites.

So apart from thread pool executor, you probably need to also need to correctly work out the next index in each thread to pick the element from that list in such a way that thread does not pick up same element from list and also not miss any element.

Hence i think you can use BlockingQueue instead of list which eliminates the index calculation part and guarantees that the element is correctly picked from the collection.

public class WebsitesHandler {

    public static void main(String[] args) {
        BlockingQueue<Object> websites = new LinkedBlockingQueue<>();
        ExecutorService executorService = Executors.newFixedThreadPool(3);
        Worker[] workers = new Worker[3];
        for (int i = 0; i < workers.length; i++) {
            workers[i] = new Worker(websites);
        }
        try {
            executorService.invokeAll(Arrays.asList(workers));
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

    private static class Worker implements Callable {

        private BlockingQueue<Object> websites;

        public Worker(BlockingQueue<Object> websites) {
            this.websites = websites;
        }

        public String call() {
            try {
                Object website;
                while ((website = websites.poll(1, TimeUnit.SECONDS)) != null) {
                    // execute the task
                }
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            return "done";
        }
    }
}
hemant1900
  • 1,226
  • 8
  • 9
0

I think you need to update yourself with latest version of java i.e Java8

And study about Streams API,That will definitely solve your problem

Abhishek Mishra
  • 611
  • 4
  • 11