1

I'm using Nutch to crawl website and currently writing a plugin. Jaunt 1.0.0.1 is used to Parse HTML. For example, I have a row

Element infoBooksItem = body.findFirst("<div class=info_books_item>");

Which gets and error, when on page is no <div class=info_books_item>. Currently I'm looking at Jaunt JavaDocs, but can't figure out how to check, is there such element or not.

Katka
  • 194
  • 1
  • 14

1 Answers1

1

You are correct that the findFirst method throws an Exception if the element is not found.. You can use a try-catch block to catch the NotFound Exception in your code, and take it from there, or if you can write a helper method that does not throw an Exception (if you just need a boolean detector)

public boolean has(Element element, String target){
  try{
    element.findFirst(target);
    return true;
  }
  catch(NotFound n){
    return false;
  }
}

Alternatively, you can use the findEvery method, which does not throw an Exception, as a boolean detector:

if(body.findEvery("<div class=info_books_item>").size() > 0){
}