3

Suppose I have a CSV file with hundreds of lines with two random keywords as cells I'd like to Google search and have the first result on the page printed to the console or stored in some array. In the case of this example, I imagine I would successfully do this reading one line at a time using something like the following:

CSVReader reader = new CSVReader(new FileReader(FILE_PATH));
String [] nextLine;
while ((nextLine = reader.readNext())) !=null) {
driver.get("http://google.com/");
driver.findElement(By.name("q").click();
driver.findElement(By.name("q").clear();
driver.findElement(By.name("q").sendKeys(nextLine[0] + " " + nextLine[1]);
System.out.println(driver.findElement(By.xpath(XPATH_TO_1ST));
}

How would I go about having 5 or however many threads of chromedriver through selenium process the CSV file as fast as possible? I've been able to get 5 lines done at a time implementing Runnable on a class that does this and starting 5 threads, but I would like to know if there is a solution where as soon as one thread is complete, it processes the next available or unprocessed line, as opposed to waiting for the 5 searches to process, then going on to the next 5 lines. Would appreciate any suggested reading or tips on cracking this!

3 Answers3

1

This is a pure java response, rather than specifically a selenium response.

You want to partition the data. A crude but effective partitioner can be made by reading a row from the CSV file and putting it in a Queue. Afterwards, run as many threads as you can profitably use to simply pull the next entry off of the queue and process it.

pojo-guy
  • 966
  • 1
  • 12
  • 39
0

If you want to do 5 (or more) threads at the same time, you would need to start 5 instances of WebDriver as it is not thread safe. As for updating the CSV, you would need to synchronize writes to that for each thread to prevent corruption to the file itself, or you could batch up updates at some threshold and write several lines at once.

See this Can Selenium use multi threading in one browser?

Update:

How about this? It ensures the web driver is not re-used between threads.

CSVReader reader = new CSVReader(new FileReader(FILE_PATH));

// number to do at same time
int concurrencyCount = 5;
ExecutorService executorService = Executors.newFixedThreadPool(concurrencyCount);
CompletionService<Boolean> completionService = new ExecutorCompletionService<Boolean>(executorService);
String[] nextLine;

// ensure we use a distinct WebDriver instance per thread
final LinkedBlockingQueue<WebDriver> webDrivers = new LinkedBlockingQueue<WebDriver>();
for (int i=0; i<concurrencyCount; i++) {
    webDrivers.offer(new ChromeDriver());
}
int count = 0;
while ((nextLine = reader.readNext()) != null) {
    final String [] line = nextLine;
    completionService.submit(new Callable<Boolean>() {
        public Boolean call() {
            try {
                // take a webdriver from the queue to use
                final WebDriver driver = webDrivers.take();
                driver.get("http://google.com/");
                driver.findElement(By.name("q")).click();
                driver.findElement(By.name("q")).clear();
                driver.findElement(By.name("q")).sendKeys(line[0] + " " + line[1]);
                System.out.println(line[1]);
                line[2] = driver.findElement(By.xpath(XPATH_TO_1ST)).getText();

                // put webdriver back on the queue
                webDrivers.offer(driver);
                return true;
            } catch (InterruptedException e) {
                e.printStackTrace();
                return false;
            }
        }
    });
    count++;
}

boolean errors = false;
while(count-- > 0) {
    Future<Boolean> resultFuture = completionService.take();
    try {
        Boolean result = resultFuture.get();
    } catch(Exception e) {
        e.printStackTrace();
        errors = true;
    }
}
System.out.println("done, errors=" + errors);
for (WebDriver webDriver : webDrivers) {
    webDriver.close();
}
executorService.shutdown();
tom
  • 1,331
  • 1
  • 15
  • 28
  • I may be wording this incorrectly saying I have 5 threads of WebDriver but from my experience so far having 5 lines of `WebDriver driver = new ChromeDriver()` outside of a class that isnt implementing Runnable starts 5 instances sequentially as opposed to all at once which is my goal. I am going to update the question to mention that appending each line with something such as Search processed was something I realized I wished to learn as I was typing the question, but my main priority is to search through the terms in CSV as fast as possible with 5 WebDrivers.Thanks for the input so far @tom ! –  Sep 20 '18 at 23:12
  • Yes this works great for what I'm trying to accomplish. I utilized the initial code you provided and going to go through it and understand everything you're doing here but I appreciate the working example you added. Do you have anything to add in the case an error or an if else clause I would add gets triggered by one of the threads to pause all the threads, open another WebDriver instance/thread with a set of instructions, and when that is complete, resume the rest of the threads? –  Sep 21 '18 at 17:28
  • Some of the things I can think of, use [WebDriverWait](https://stackoverflow.com/questions/11736027/webdriver-wait-for-element-using-java) in case Google is slow (never will happen though). Also in the catch block in the Callable, maybe try again instead of just giving up (so add a loop with retries). But I think add error handling organically as you find issues. It's not terribly complicated. The coolest thing is re-using WebDriver instances by pushing and popping them onto a queue IMHO to avoid re-use by another thread. :-) – tom Sep 22 '18 at 12:26
  • I am running into what I believe is a memory leak issue when utilizing the initial solution you (@tom) later replaced where after around 3000 or so iterations the webdrivers even at 4mb a piece eat away at my RAM and my computer slows to a halt. I've tried throwing `driver.close(); driver.quit();` but still run into this issue. Not sure if this is more a selenium issue than java issue. –  Oct 13 '18 at 21:44
  • One more note is that when I throw in an if statement the instance ends and starts another thread. To go along with the google search example, I would like the thread to do something else if say my example search query resulting in a search suggestion. I have tried making it so that if selenium finds such the element saying a suggestion, it executes what's inside the if block, otherwise the code should move on, e.g. `if(driver.findElements(By.xpath("xpath/to/suggestion/").size() != 0 { do stuff }`. This solution has worked in my non threaded classes but for some reason is breaking here. –  Oct 13 '18 at 22:09
  • For your memory issue, I suggest maybe using JVisualVM and analyze the memory. Limit to 2 concurrent instances to keep it simple. You can snapshot memory using JVisualVM after each iteration (with some breakpoints) and use that to determine what thing is leaking. As to your other question maybe just make sure you're not calling webDrivers.offer() too soon? I don't have your whole code but I think you just need to do some digging... – tom Oct 15 '18 at 04:05
0

You can create Callable for each row and give it to the ExecutorService. It takes care of the execution of the tasks and manages the worker threads for you. Carefully choose the thread pool size for optimal execution time.

More information about thread pool size can be found here

samzz
  • 117
  • 1
  • 3