I am trying to extract the (first 5) urls from a google search page. i tried to extract it using the selenium web driver. i get the firefox opened and the page loads too but the regex does not match the urls on the page. how do i get the urls extracted?
i have used the following code so far:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.openqa.selenium.WebDriver;
import org.openga.selenium.firefox.FirefoxDriver;
public class Weburlext {
public static void main (String[] args){
String line = null;
Webdriver driver = new FirefoxDriver();
driver.ger("http://www.google.co.in/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=sample%20data");
String regex="@^(http\\:\\/\\/|https\\:\\/\\/)?([a-z0-9][a-z0-9\\-]*\\.)+[a-z0-9][a-z0-9\\-]*$@i";
Pattern p = Pattern.compile(regex,pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(line);
System.out.print(line);
driver.quit();
}
}