0

I want to get the xpath of a facebook post using HtmlUnit. You can refer these two questions to get more ideas on what I want to do:

  1. Supernatural behaviour with a facebook page
  2. HtmlUnit commenting out lines of facebook page

To simulate what I did, you can follow q-1. The pastebin link of HTML code(of facebook page) is http://pastebin.com/MfXsYSJQ.

Or simply you can go to https://www.facebook.com/bhramakarserver . I just want to get the xpath of the span containing the post with text:"Hi! this is the first post of this page." What I tried was this:

    public class ForStackOverflow {
        public static void main(String[] args) throws IOException {
            WebClient client=new WebClient(BrowserVersion.FIREFOX_17);
            client.getOptions().setJavaScriptEnabled(true);
            client.getOptions().setRedirectEnabled(true);
            client.getOptions().setThrowExceptionOnScriptError(true);
            client.getOptions().setCssEnabled(true);
            client.getOptions().setUseInsecureSSL(true);
            client.getOptions().setThrowExceptionOnFailingStatusCode(false);
            client.setAjaxController(new NicelyResynchronizingAjaxController());

            HtmlPage page1=client.getPage("https://www.facebook.com/bhramakarserver");
            System.out.println(page1.asXml());
            //getting the xpath of span of class="userContent"
            HtmlInput input=(HtmlInput)page1.getByXPath("/html/body//input[@type='submit']").get(0);
            System.out.println(input.asXml());
//This line gives error as the xpath evaluates to null
            HtmlSpan span=(HtmlSpan)page1.getByXPath("/html/body//span[@class='userContent']").get(0);
        }
    }

The problem which seems is that the page1 has the static html. In this, the span element:

<span data-ft="&#123;&quot;tn&quot;:&quot;K&quot;&#125;" class="userContent">Hi! this is the  first post of this page.</span>

is generated dynamically. So it appears as commented in html of page1.But on inspection via inspect element, it appears as normal. Hence its dynamically uncommented.Is there no way that I can get page1's html to be in the state after all its dynamic contents have been loaded so that I may get the xpath correctly? Can it be done using selenium web-driver?

Community
  • 1
  • 1
rahulserver
  • 10,411
  • 24
  • 90
  • 164

1 Answers1

1

Given that information, it seems fair to assume that some AJAX call is not being fired or that you're not properly waiting for the AJAX to execute. I haven't gotten the best results using that AJAX controller. Sadly, a loop is usually the best way to go.

I've explained how to do that in this question: Get the changed HTML content after it's updated by Javascript? (htmlunit)

If this doesn't do the trick, then probably you're getting a JavaScript exception. I've written some possible workarounds to that situation in this other question: How to overcome an HTMLUnit ScriptException?

If none of these work... then I'd recommend using something else rather than HTMLUnit. Any real browser drive would do the trick. Or maybe using some other alternative such as PhantomJS or ZombieJS.

Community
  • 1
  • 1
Mosty Mostacho
  • 42,742
  • 16
  • 96
  • 123
  • Thanks for your quick response! I have upvoted for your answer as it is really cool. However if you see the code in the facebook page, i dont know what javascript function for wait to execute.Needs further homework from me now!! – rahulserver Jan 26 '14 at 05:32