Final redirected URL using Selenium Htmlunit Java

Question

I have a question for you , which I think could possibly be solved using Selenium I have a set of URLs like for example the one below.

http://www.sears.com/search=little tikes&Little Tikes?filter=Brand&keywordSearch=false&vName=Toys+%26+Games&catalogId=12605&catPrediction=false&previousSort=ORIGINAL_SORT_ORDER&viewItems=50&storeId=10153&adCell=W3

if you paste the URL as it is in the browser it will end up redirecting to another URL , which you can verify in the address bar of the browser(Firefox for example). I need to get the redirected URL , regardless of if the redirect was from a javascript code or not is it possible to do this using the selenium framework ?

I have already tried using HTMLUnit for this however I get the following javascript execution error. Please help!

com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot call method "indexOf" of null (script in http://www.sears.com/search=little%20tikes&Little%20Tikes?filter=Brand&keywordSearch=false&catalogId=12605&adCell=W3&catPrediction=false&previousSort=ORIGINAL_SORT_ORDER&viewItems=50&storeId=10153&levels=Toys+%26+Games from (6942, 33) to (6974, 14)#6966)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:669) ~[htmlunit-2.12.jar:2.12]
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:601) ~[htmlunit-core-js-2.12.jar:?]
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:507) ~[htmlunit-core-js-2.12.jar:?]
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:555) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:530) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:979) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:337) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:415) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:266) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:276) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:676) ~[htmlunit-2.12.jar:2.12]
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) ~[xercesImpl-2.10.0.jar:?]
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:635) ~[htmlunit-2.12.jar:2.12]
    at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170) ~[nekohtml-1.9.18.jar:1.9.18]
    at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072) ~[nekohtml-1.9.18.jar:1.9.18]
    at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206) ~[nekohtml-1.9.18.jar:?]
    at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330) ~[nekohtml-1.9.18.jar:?]
    at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3074) ~[nekohtml-1.9.18.jar:1.9.18]
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2041) ~[nekohtml-1.9.18.jar:1.9.18]
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:918) ~[nekohtml-1.9.18.jar:1.9.18]
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) ~[nekohtml-1.9.18.jar:1.9.18]
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) ~[nekohtml-1.9.18.jar:1.9.18]
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) ~[xercesImpl-2.10.0.jar:?]
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:892) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:241) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:187) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:434) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:309) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:374) ~[htmlunit-2.12.jar:2.12]

This question could be a duplicate of: http://stackoverflow.com/questions/20315330/how-to-overcame-htmlunit-scriptexception — Mosty Mostacho, Dec 31 '13 at 16:15
does not seem to be a duplicate, the use-case is different click vs get and the exception stack trace is completely different as well — user1965449, Dec 31 '13 at 16:44

A Paul · Accepted Answer · 2014-01-01T03:23:17.890

1

This should be easy if I have understood you question. Below are the steps

1. Get the FirefoxDriver object
2. call driver.get("http://www.sears.com/search=little tikes&Little Tikes?filter=Brand&keywordSearch=false&vName=Toys+%26+Games&catalogId=12605&catPrediction=false&previousSort=ORIGINAL_SORT_ORDER&viewItems=50&storeId=10153&adCell=W3");
This will open the url in firefox. On open the url will be forwarded to actual url. (This is as per my understanding from your description)
3. Then you can do driver.getCurrentUrl(). This will give you the url.

Let me know if this works for you :)

UPDATE :

        WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_9);
        webClient.getOptions().setJavaScriptEnabled(true);
        webClient.getOptions().setRedirectEnabled(true);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setCssEnabled(true);     
        HtmlPage page = (HtmlPage) webClient.getPage("http://www.sears.com/search=little tikes&Little Tikes?filter=Brand&keywordSearch=false&vName=Toys+%26+Games&catalogId=12605&catPrediction=false&previousSort=ORIGINAL_SORT_ORDER&viewItems=50&storeId=10153&adCell=W3");
        WebResponse response = page.getWebResponse();
        String content = response.getContentAsString();
        System.out.println(page.getUrl());

edited Jan 01 '14 at 03:23

answered Dec 31 '13 at 07:37

A Paul

8,113
3
31
61

2

thanks ABP i will try this once I rule out HTMLUnit. But when you "This will open the url in firefox" will it open a window in an actual window or is it mimiced in the java program ? Thanks – user1965449 Dec 31 '13 at 16:00
It will open a Firefox browser instance. – A Paul Dec 31 '13 at 17:20
1

how can this be done in a simulated fashion , meaning the java program simulates a web browser and not the actual browser ? Thanks. – user1965449 Dec 31 '13 at 18:50
If you do not want to open the Firefox and want to do the html open using the java program them you have to use HTMLUnitDriver. But HTMLUnitDriver have issues with Javascript, I have faced Javascript issues that I was not able to solve. Tried everything. Also please check the EDIT in my post. Added another code, might work for you. – A Paul Jan 01 '14 at 03:21
Thanks! I am doing the exact samething , but using Forefox_17 instead, it works but its way too slow , so much so that I cannot even debug in eclipse . I need to parse many urls and need to scale , not sure what options I have though . – user1965449 Jan 01 '14 at 07:04
Use my first option using Firefox driver it is faster than HTMLUnit and also no issues with javascripts. Last thing if this answers you question please select my answer as correct. :) – A Paul Jan 01 '14 at 07:41
Yes, I have tested it , its fast but I have to open thousands possibly millions of URLs, so opening up a new browser for each is not an option. Thanks. – user1965449 Jan 01 '14 at 18:41
Actually you do not have to open a new browser each time, a browser window means a driver instance, you can use same driver or browser to open the urls. – A Paul Jan 01 '14 at 19:21
I mean having a browser window , one or more is not an option because I need to be able to pull multiple web pages concurrently. Thanks! – user1965449 Jan 01 '14 at 20:51

score 1 · Answer 2 · answered Dec 31 '13 at 10:27

If you are using HTMLUnit Driver, then please enable JavaScript (it's set off by default) as shown below.

More over HTMLUnit uses Rhino as it's JavaScript engine which differs from other main stream browser JS engines.

HtmlUnitDriver Browser_Session= new HtmlUnitDriver();
Browser_Session.setJavascriptEnabled(true);

or

HtmlUnitDriver Browser_Session = new HtmlUnitDriver(true);

Below steps should fetch the redirected url.

Browser_Session.navigate().to("URL");
Browser_Session.getCurrentUrl(); //This fetches the current re-directed URL.

Hope this helps

Thanks! I was using HTMLUnit WebClient class with JavaScript enabled to true, apparently Rhino JS engine was throwing this exception , are you saying that if I use HTMLUnitDriver instead it will not cause this Javascript execution exception because it does not use Rhino engine ? Thanks. — user1965449, Dec 31 '13 at 16:02

Final redirected URL using Selenium Htmlunit Java

2 Answers2