I'm trying to get page source code using Selenium, but I got empty page

Question

I'm trying to get page source code using Selenium, the code is general SOP. it worked out for Baidu.com and example.com. but when it comes to the URL i actually need,I got empty page.and the source code show nothing but empty tags like the following code. is there anything I missing?

I tried to add up some more params of options, but it doesn't seem helpful

WebDriver driver;

    System.setProperty("webdriver.chrome.driver", "E:\\applications\\ChromeDriver\\chromedriver_win32 (2)//chromedriver.exe");

    // 实例化一个WebDriver的对象    作用：启动谷歌浏览器
    driver = new ChromeDriver();

    driver.manage().timeouts().implicitlyWait(2, TimeUnit.SECONDS);

    driver.get("http://rd.huangpuqu.sh.cn/website/html/shprd/shprd_tpxw/List/list_0.htm");
    String pageSource = driver.getPageSource();
    String title = driver.getTitle();
    System.out.println("==========="+title+"==============");
    System.out.println(Jsoup.parse(pageSource));

I expect the parsed page source of the URL so that I can get the info I need. but I'm stuck in here.

what's the output? try to change the path and see if it throws any exceptions, in this way you will make sure that it's not because driver can't be found. — Jason Young, Jun 11 '19 at 02:17

Adi Ohana · Accepted Answer · 2019-06-14T02:41:51.057

5

I could reproduce the issue with this website when using ChromeDriver. What I found is that there is a JS detecting that you are using ChromeDriver and blocks the request to the web page with 400 HTTP error code:

Now, Firefox is working as expected with the following code:

    FirefoxDriver driver = new FirefoxDriver();

    driver.get("http://rd.huangpuqu.sh.cn/website/html/shprd/shprd_tpxw/List/list_0.htm");
    Thread.sleep(5000);
    String pageSource = driver.getPageSource();
    String title = driver.getTitle();
    System.out.println("==========="+title+"==============");
    System.out.println(Jsoup.parse(pageSource));

    driver.quit();

I used just a sleep for 5 seconds which worked. The best practice is to wait for a specific element in your page, check this for reference - How to wait until an element is present in Selenium?

firefox browser version: 67.0.1 geckodriver 0.24.0 selenium version: 3.141.59

edited Jun 14 '19 at 02:41

answered Jun 12 '19 at 19:38

Adi Ohana

927
2
13
18

thanks for you headsup. and I tried to download Firefox and its WebDriver for Selenium, but it seems there's compatibilty problem, could u tell me which version are you using of these three.that'll be really helpful.@Adi Ohana – HneryInSH Jun 14 '19 at 02:36
@user9786842 edited my answer with the versions i'm using – Adi Ohana Jun 14 '19 at 02:42
thank you for the info. I've changed these version as needed. but I get the error——`Unable to connect to host 127.0.0.1 on port 7055 after 45000 ms.` then,I've done a little search,lots of answers suggest that it's a compatibility issue.I guess I’ll downgrade the firefox to give it another try. – HneryInSH Jun 14 '19 at 04:49
and, I hope this is not too ask.With `selenium version: 3.141.59`. Does this mean I should just add the following dependency into my pom.xml or this is lots of other I need to add? ` org.seleniumhq.selenium selenium-java 3.141.59 ` – HneryInSH Jun 14 '19 at 04:53
Yes only add the dependency – Adi Ohana Jun 14 '19 at 04:54
okay, any idea about `Unable to connect to host 127.0.0.1 on port 7055 after 45000 ms.`error? – HneryInSH Jun 14 '19 at 04:58
what is the OS you are running? and how do you run your java program? – Adi Ohana Jun 14 '19 at 04:59
My OS is windows10 and I run the code with IDEA 2019.1.2 – HneryInSH Jun 14 '19 at 05:12
i suggest you try downgrading the firefox browser version to 61 and geckodriver to v 0.20.1 – Adi Ohana Jun 14 '19 at 05:17
BTW my firefox's version is 67.0.2, this is I got when I download firefox from official site. it's a minor difference, I don't think there'll be much difference. – HneryInSH Jun 14 '19 at 05:19

score 0 · Answer 2 · answered Jun 17 '19 at 07:18

first for all, it's for sure a compability problem. it is mainly because of selenium-it's been through lots of development,therefore, tons of problems about version compatibility. Here is how I fianlly deal with this.

I chose Firefox browser to drive, the version is 67.0(64 bit).Cos Chrome will response with a blank result as @Adi Ohana mentioned. and I use Selenium with the version of 3.X. to use Selenium 3.X,I add the following code in pom.xml:

<dependency>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-server</artifactId>
            <version>3.141.59</version> <!-- this version context matters -->
        </dependency>

note this, it's <artifactId>selenium-server</artifactId> you need add into your pom.xml.otherwise,you may get some unexpected error.

with these done, you need a proper driver.the driver for firefox named geckodriver.I use v0.24.0 version,it's a .exe file ranther than .jar so that you can specify it by java code in your programming like this:

System.setProperty("webdriver.gecko.driver","E:\\applications\\GeckoDriver-v0.24.0-win64\\geckodriver.exe"); // 0.24.0 the 2nd param is the location of geckodriver.exe in your local computer

then, send a request for the URL.and since the body content is loaded by another AJAX request. you need wait a couple of second for Selenium to doing that.

Thread.sleep(5000); // this is the easyest way, may not the best though.

Conclusion:I get the original source code as I expected,but I do not tackle why googleDriver can not work as expected.I may leave this for a further digging.

Sum things up: Firefox 67.0 geckodriver v0.24.0 [sepecfied by java-code] Selenium 3.X [add by xml-code]

thanks for all you guys, it's been really helpful. like this community

PS:I'm new to use stackoverflow.still learning the ropes...

I'm trying to get page source code using Selenium, but I got empty page

2 Answers2