My object is to scrape data by using Java Selenium. I am able to load selenium driver, connect to the website and fetch the first column then go to the next pagination button until its become disable and write it to the console. Here is what I did so far:
public static WebDriver driver;
public static void main(String[] args) throws Exception {
System.setProperty("webdriver.chrome.driver", "E:\\eclipse-workspace\\package-name\\src\\working\\selenium\\driver\\chromedriver.exe");
System.setProperty("webdriver.chrome.silentOutput", "true");
driver = new ChromeDriver();
driver.get("https://datatables.net/examples/basic_init/zero_configuration.html");
driver.manage().window().maximize();
compareDispalyedRowCountToActualRowCount();
}
public static void compareDispalyedRowCountToActualRowCount() throws Exception {
try {
Thread.sleep(5000);
List<WebElement> namesElements = driver.findElements(By.cssSelector("#example>tbody>tr>td:nth-child(1)"));
System.out.println("size of names elements : " + namesElements.size());
List<String> names = new ArrayList<String>();
//Adding column1 elements to the list
for (WebElement nameEle : namesElements) {
names.add(nameEle.getText());
}
//Displaying the list elements on console
for (WebElement s : namesElements) {
System.out.println(s.getText());
}
//locating next button
String nextButtonClass = driver.findElement(By.id("example_next")).getAttribute("class");
//traversing through the table until the last button and adding names to the list defined about
while (!nextButtonClass.contains("disabled")) {
driver.findElement(By.id("example_next")).click();
Thread.sleep(1000);
namesElements = driver.findElements(By.cssSelector("#example>tbody>tr>td:nth-child(1)"));
for (WebElement nameEle : namesElements) {
names.add(nameEle.getText());
}
nextButtonClass = driver.findElement(By.id("example_next")).getAttribute("class");
}
//printing the whole list elements
for (String name : names) {
System.out.println(name);
}
//counting the size of the list
int actualCount = names.size();
System.out.println("Total number of names :" + actualCount);
//locating displayed count
String displayedCountString = driver.findElement(By.id("example_info")).getText().split(" ")[5];
int displayedCount = Integer.parseInt(displayedCountString);
System.out.println("Total Number of Displayed Names count:" + displayedCount);
Thread.sleep(1000);
// Actual count calculated Vs Dispalyed Count
if (actualCount == displayedCount) {
System.out.println("Actual row count = Displayed row Count");
} else {
System.out.println("Actual row count != Displayed row Count");
throw new Exception("Actual row count != Displayed row Count");
}
} catch (Exception e) {
e.printStackTrace();
}
}
I want to:
- scrape more than one column or may be selected columns for example on this LINK name, office and age column
- Then want to write these columns data in CSV file
Update
I tried like this but not running:
for(WebElement trElement : tr_collection){
int col_num=1;
List<WebElement> td_collection = trElement.findElements(
By.xpath("//*[@id=\"example\"]/tbody/tr[rown_num]/td[col_num]")
);
for(WebElement tdElement : td_collection){
rows += tdElement.getText()+"\t";
col_num++;
}
rows = rows + "\n";
row_num++;
}