4

I'm trying to get page source code using Selenium, the code is general SOP. it worked out for Baidu.com and example.com. but when it comes to the URL i actually need,I got empty page.and the source code show nothing but empty tags like the following code. is there anything I missing?

I tried to add up some more params of options, but it doesn't seem helpful

WebDriver driver;

    System.setProperty("webdriver.chrome.driver", "E:\\applications\\ChromeDriver\\chromedriver_win32 (2)//chromedriver.exe");

    // 实例化一个WebDriver的对象    作用:启动谷歌浏览器
    driver = new ChromeDriver();

    driver.manage().timeouts().implicitlyWait(2, TimeUnit.SECONDS);

    driver.get("http://rd.huangpuqu.sh.cn/website/html/shprd/shprd_tpxw/List/list_0.htm");
    String pageSource = driver.getPageSource();
    String title = driver.getTitle();
    System.out.println("==========="+title+"==============");
    System.out.println(Jsoup.parse(pageSource)); 

I expect the parsed page source of the URL so that I can get the info I need. but I'm stuck in here.

HneryInSH
  • 63
  • 1
  • 6

2 Answers2

5

I could reproduce the issue with this website when using ChromeDriver. What I found is that there is a JS detecting that you are using ChromeDriver and blocks the request to the web page with 400 HTTP error code:

enter image description here

Now, Firefox is working as expected with the following code:

    FirefoxDriver driver = new FirefoxDriver();

    driver.get("http://rd.huangpuqu.sh.cn/website/html/shprd/shprd_tpxw/List/list_0.htm");
    Thread.sleep(5000);
    String pageSource = driver.getPageSource();
    String title = driver.getTitle();
    System.out.println("==========="+title+"==============");
    System.out.println(Jsoup.parse(pageSource));

    driver.quit();

I used just a sleep for 5 seconds which worked. The best practice is to wait for a specific element in your page, check this for reference - How to wait until an element is present in Selenium?

firefox browser version: 67.0.1 geckodriver 0.24.0 selenium version: 3.141.59

Adi Ohana
  • 927
  • 2
  • 13
  • 18
  • thanks for you headsup. and I tried to download Firefox and its WebDriver for Selenium, but it seems there's compatibilty problem, could u tell me which version are you using of these three.that'll be really helpful.@Adi Ohana – HneryInSH Jun 14 '19 at 02:36
  • @user9786842 edited my answer with the versions i'm using – Adi Ohana Jun 14 '19 at 02:42
  • thank you for the info. I've changed these version as needed. but I get the error——`Unable to connect to host 127.0.0.1 on port 7055 after 45000 ms.` then,I've done a little search,lots of answers suggest that it's a compatibility issue.I guess I’ll downgrade the firefox to give it another try. – HneryInSH Jun 14 '19 at 04:49
  • and, I hope this is not too ask.With `selenium version: 3.141.59`. Does this mean I should just add the following dependency into my pom.xml or this is lots of other I need to add? ` org.seleniumhq.selenium selenium-java 3.141.59 ` – HneryInSH Jun 14 '19 at 04:53
  • Yes only add the dependency – Adi Ohana Jun 14 '19 at 04:54
  • okay, any idea about `Unable to connect to host 127.0.0.1 on port 7055 after 45000 ms.`error? – HneryInSH Jun 14 '19 at 04:58
  • what is the OS you are running? and how do you run your java program? – Adi Ohana Jun 14 '19 at 04:59
  • My OS is windows10 and I run the code with IDEA 2019.1.2 – HneryInSH Jun 14 '19 at 05:12
  • i suggest you try downgrading the firefox browser version to 61 and geckodriver to v 0.20.1 – Adi Ohana Jun 14 '19 at 05:17
  • BTW my firefox's version is 67.0.2, this is I got when I download firefox from official site. it's a minor difference, I don't think there'll be much difference. – HneryInSH Jun 14 '19 at 05:19
0

first for all, it's for sure a compability problem. it is mainly because of selenium-it's been through lots of development,therefore, tons of problems about version compatibility. Here is how I fianlly deal with this.

I chose Firefox browser to drive, the version is 67.0(64 bit).Cos Chrome will response with a blank result as @Adi Ohana mentioned. and I use Selenium with the version of 3.X. to use Selenium 3.X,I add the following code in pom.xml:

<dependency>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-server</artifactId>
            <version>3.141.59</version> <!-- this version context matters -->
        </dependency>

note this, it's <artifactId>selenium-server</artifactId> you need add into your pom.xml.otherwise,you may get some unexpected error.

with these done, you need a proper driver.the driver for firefox named geckodriver.I use v0.24.0 version,it's a .exe file ranther than .jar so that you can specify it by java code in your programming like this:

System.setProperty("webdriver.gecko.driver","E:\\applications\\GeckoDriver-v0.24.0-win64\\geckodriver.exe"); // 0.24.0 the 2nd param is the location of geckodriver.exe in your local computer

then, send a request for the URL.and since the body content is loaded by another AJAX request. you need wait a couple of second for Selenium to doing that.

Thread.sleep(5000); // this is the easyest way, may not the best though.

Conclusion:I get the original source code as I expected,but I do not tackle why googleDriver can not work as expected.I may leave this for a further digging.

Sum things up: Firefox 67.0 geckodriver v0.24.0 [sepecfied by java-code] Selenium 3.X [add by xml-code]

thanks for all you guys, it's been really helpful. like this community

PS:I'm new to use stackoverflow.still learning the ropes...

HneryInSH
  • 63
  • 1
  • 6