I'm trying to write a crawler to get the menu items from a site using regex in java. The website url is http://www.dinebombaygarden.com/appetizers.html
How can I get the menu items (Vegetable Pakpora, Onion or Spinach or Potato Pakora ...) using Pattern and Matcher?
My code is as follows, but not woking good.
public ArrayList<String> getMenuItems(String menuURL, String menuRegex) throws IOException{
ArrayList<String> items = new ArrayList<String>();
Document doc = Jsoup.connect(menuURL).post();
String text = doc.body().text();
System.out.println(text);
Pattern pattern = Pattern.compile(menuRegex);
Matcher matcher = pattern.matcher(text);
while(matcher.find()){
items.add(matcher.group());
}
return items;
}
String menuURL = "http://www.dinebombaygarden.com/appetizers.html";
String menuRegex = "[A-Z][a-z]+.{10,50}[$]\\s[\\d.]+.95";
The menuRegex here is not working good. Anyone can help with this issue?
Thank you very much.