I am writing a multithreaded webcrawler, where there is one WebCrawler
object which uses an ExecutorService to process WebPage
s and extract anchors from each page. I have a method defined in the WebCrawler
class which can be called by WebPage
s to add extracted sublinks to the WebCrawler
's Set of nextPagestoVisit
, and the method currently looks like this:
public synchronized void addSublinks(Set<WebPage> sublinks) {
this.nextPagestoVisit.addAll(sublinks);
}
Currently I am using a synchronized method. However, I am considering other possible options.
Making the Set a synchronizedSet:
public Set<WebPage> nextPagestoVisit = Collections.synchronizedSet(new HashSet<WebPage>());
Making the Set volatile:
public volatile Set<WebPage> nextPagestoVisit = new HashSet<WebPage>();
Are both of these two alternatives sufficient on their own? (I am assuming that the synchronized method approach is sufficient). Or would I have to combine them with other safety measures? If they all work, which one would be the best approach? If one or both do not work, please provide a short explanation of why (ie. what kind of scenario would cause problems). Thanks
Edit: To be clear, my goal is to ensure that if two WebPage
s both try to add their sublinks at the same time, one write will not be overwritten by the other (ie. all sublinks will successfully be added to the Set).