In another question on stackoverflow I got a hint that I can use thread pools for the producer-consumer pattern my crawlers are creating.
However, I just cannot find out how to implement it.
In a producer consumer thread on SO they just use the producer consumer to manage the producers and consumers (which in my case would be the crawler themselves; and this is not so much different from my for-loop), but this does not seem the intention of the commentor in my article (as he could not see I used a for loop). The workload is still shared via a queue
there.
I also thought about passing a Website
object to ExecutorService.submit()
with this implementation (and remove Runnable
from Crawler
):
public class Website implements Runnable {
private URL url;
public Website(URL url) {
this.url = url;
}
@Override
public void run() {
Crawler crawler = new Crawler();
crawler.crawl(url);
}
}
But the problem is that
- I think there are too many crawlers being generated
- Crawler() expects a queue of already visited websites
How can I properly implement the producer, consumer pattern in my crawler problem? I’m getting totally confused about it all. I checked so many websites about it on the web and all seem to use it differently.