I use the Selenium Webdriver for C# and for Python to obtain data elements from websites, but the speed of the web scraping is terribly slow. Scraping 35000 data tables took me about 1,5 day. With the Selenium Webdriver I can execute Javascript to get a Java element. Is there some library available which doesn't require something like a Webdriver to execute Javascript on a webpage to retrieve elements and is able to click on elements as well? Or is there a faster alternative to Selenium?
-
Have you tried phantomjs? See this queston - http://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python – LittlePanda Apr 16 '15 at 09:57
-
Yes I've tried PhantomJS for Selenium and it is faster than the Chromedriver for Selenium. I also found out that I can get the table directly by extracting the text between the or
tags, instead of extracting it from each table element individually: driver.find_element_by_tag_name("td") -> driver.find_element_by_tag_name("tbody").
– Robert Smit Jun 17 '15 at 12:41
4 Answers
I suggest you to use TestCafe.
TestCafe is free, open source framework for web functional testing (e2e testing). TestCafe's based on Node.js and doesn't use WebDriver at all.
TestCafe-powered tests are executed on the server side. To obtain DOM-elements, TestCafe provides powerfull flexible system of Selectors. TestCafe can execute JavaScript on tested webpage using the ClientFunction feature (see our Documentation).
TestCafe tests are really very fast, see for yourself. But the high speed test run does not affect the stability thanks to a build-in smart wait system.
Installation of TestCafe is very easy:
1) Check that you have Node.js on your PC (or install it).
2) To install TestCafe open cmd and type in:
npm install -g testcafe
Writing test is not a rocket-science. Here is a quick start: 1) Copy-paste the following code to your text editor and save it as "test.js"
import { Selector } from 'testcafe';
fixture `Getting Started`
.page `http://devexpress.github.io/testcafe/example`;
test('My first test', async t => {
await t
.typeText('#developer-name', 'John Smith')
.click('#submit-button')
.expect(Selector('#article-header').innerText).eql('Thank you, John Smith!');
});
2) Run test in your browser (e.g. chrome) by typing the following command in cmd:
testcafe chrome test.js
3) Get the descriptive result in the console output.
TestCafe allows you to test against various browsers: local, remote (on devices, be it browser for Raspberry Pi or Safari for iOS), cloud (e.g. Sauce Labs) or headless (e.g. Nightmare). This means that you can easily use TestCafe with your Continious Integration infrastructure.

- 1,705
- 1
- 17
- 28

- 1,026
- 6
- 7
-
1I would be great to know in an answer whether people are affiliated with a framework that they are recommending or not @HelenDikareva – lukas_o Apr 09 '20 at 10:08
I suggest Selenium + PhantomJSDriver (Ghostdriver), which is used for GUI-less browser automation. With this you can easily navigate through the pages, select elements (you can select the flights), submit forms and also perform some scraping. Javascript is also supported.
You can got through the Selenium documentation here. You will have to download phantomjs.exe file.
A good tutorial forPhantomJSDriver is given in here
Config of PhantomJSDriver(from the tutorial):
DesiredCapabilities caps = new DesiredCapabilities();
caps.setJavascriptEnabled(true); // not really needed: JS enabled by default
caps.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY, "C://phantomjs.exe");
caps.setCapability("takesScreenshot", true);
WebDriver driver = new PhantomJSDriver(caps);
Other option(this will not require WebDriver): PhantomJS
PhantomJS is a headless WebKit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.
This is GUI-less and also has the ability to take screenshots.
Example (from here):
var page = require('webpage').create();
page.open('http://example.com', function(status) {
console.log("Status: " + status);
if(status === "success") {
page.render('example.png');
}
phantom.exit();
});
PS: I would suggest JSoup for web-scraping but it does not support Javascript. PhantomJSDriver has something called Ghost.py for python.

- 2,496
- 1
- 21
- 33
-
1Thanks for your comment. I have tried Chromedriver and as well the head-less PhantomJSDriver. Both are not very fast in finding the elements. Ghost.py looks very interesting, I will read more about it. Additionally I found something about a python webscraper called Scrapy. I will try to find out if it works faster. – Robert Smit Apr 16 '15 at 11:37
-
1Yes, you could try scrapy. I believe there is something called beautifulsoup as well. I suggest you edit the question title to something specific since people are downvoting thinking that this is an opinon-based question. – LittlePanda Apr 16 '15 at 11:40
What about LeanFT? It's a new HP product that works with C# and Java and users say they switched to LeanFT "because Selenium couldn’t handle all of [their] applications."

- 21
- 3
If you use the HTMLUnit webdriver, there is no overhead of running a browser, so the code can run much faster. You could speed that up even more by abandoning a framework/toolset altogether and query pages directly and parse them for what you need. However, this makes maintenance and updating a pain.

- 815
- 1
- 8
- 23