1

I have two IO intensive processes that don't do much computing: one is getting and parsing a webpage and the other is storing some data obtained with the parsing in a database. This is going to repeat while the crawling of the web continues.

Is there a method for adding and subtracting the number of threads that are working on each task dynamically so the performance is optimal for the machine where the whole system is running? The method should not involve benchmarking because it's going to be distributed to a number of machines I cannot access beforehand.

Please guide me to some sources or information.

dimo414
  • 47,227
  • 18
  • 148
  • 244
Pedro Montoto García
  • 1,672
  • 2
  • 18
  • 38
  • Some guide http://parsec.cs.princeton.edu/publications/iiswc62-pusukuri.pdf – Pedro Montoto García May 17 '13 at 16:48
  • 1
    Use *[`FixedThreadPool`](http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool%28int%29)* , For the other part see *[this](http://stackoverflow.com/questions/1980832/java-how-to-scale-threads-according-to-cpu-cores)* post. – Extreme Coders May 17 '13 at 16:52

2 Answers2

2

Instead of using threads directly you should just create a ThreadPool to which you add a number of Runnables which do the actual work. From your description a CachedThreadPool might be suitable. Check out http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html for some guidelines how to implement.

Emil L
  • 20,219
  • 3
  • 44
  • 65
1

Well dynamically adjusting thread count should be no problem (using ThreadPoolExecutor for example).

But it looks to me that the optimal number of threads is limited by two factors:

  1. The network bandwidth for your "downloading threads"
  2. The maximum number of allowed database connections for your "database threads"

I'm not sure if the downloading part should be multithreaded at all, because each thread will just steal bandwidth from the others unless the pages are really small.

weaselflink
  • 244
  • 2
  • 7