-1

I got a problem when parsing this page. I perfectly get the price, the airline, the departure time, etc.

But what i want is the flight number. And its hidden, you need to click in a flight and the menu expands to show it.

enter image description here Here is where jsoup has to stop. if you click on "view source" anywhere, it won't show e.g. flight "6186". If you expand the menu and rightclick to there, it will give you the flight back.

The specific information I am searching for is this:

enter image description here

So, I noticed I need to use something like a headless browser to access all content. But when trying to understand phantom.js or htmlunit, i got huge problems getting started. I can't formulate what I need and I'm irritated by the features of headless browsers. An example code on how to get this into parseable HTML would be so nice.

Does anyone have experience with parsing elements like these? Thank you so much in advance.

best regards!

UPDATE to answer to jPOs comment. Other help still extremely apreciated - Here is what i see when i inspect the events: enter image description here

UPDATE 2: Any Ideas on how to do this? Maybe headless browsers?

ZedBrannigan
  • 601
  • 1
  • 8
  • 18
  • I have to tell you bad news. When you try to parse this, you'd need a compiler first, which would load also the javascripts on the page. This doesn't seem as "static" html you are showing us. These lines are generated with javascript and I presume you'd find an ajax request in your net console in firebug. So maybe it would be easier to understand the ajax request and retrieve the data directly from there. – jPO Jan 27 '15 at 15:18
  • Thanks jPO for having a look at this. Since I am not experienced with AJAX yet, how can I start here. In chrome i inspected the element, went to "events" and selected this (I updated my post to show a picture): Is that of any help? – ZedBrannigan Jan 27 '15 at 15:22
  • Sorry, but not really. I don't know how the console of chrome looks like, but there has to be a point where you see all the traffic. If you can find that, then if you click on the button which opens this "span" you should see communication flow. After that you can click the line in the console where that happened and you should see the "url" of that request. – jPO Jan 27 '15 at 15:26
  • 1
    Okay I just found it. try this [http://www.kayak.de/s/run/inlineDetails/flight?currentview=list&fs=checkedbags%3D0%3Bairports%3Dap0%3DMUC%2CBER%2CTXL%3Bcodeshare%3Dtrue%3Bbaditin%3Dtrue&localidx=104&resultid=ea63237698f20885f47a9b3827f9953b&searchid=kUECCUIWzU&suppressLookback=false](link) and have a look at the parameters. There you find deinen Flugnummer. – jPO Jan 27 '15 at 15:29
  • 1
    FWIW I would do a comparison between the plain html (view source) and the html in the inspector. Also in Chrome you can check if content is being loaded via ajax by opening the console, right clicking and selecting LOG XMLHttpRequests. You will probably need to reload the page though. It is possible that the information is only being sent on the click command. – lharby Jan 27 '15 at 15:32
  • That's great. So I would need to loop this for every (15 per Page) flight displayed? Im not sure how you got this link, since I can't reconstruct it in chrome. – ZedBrannigan Jan 27 '15 at 15:32
  • @The_guy_with_noob_questions yes! Or as I have written in the answer, but obviously you don't do it with javascript. Ist a okay. Only wanted to say, that the localidx needs to be changed for every line. And how I got it? Exactly as I told you. Find the network tab in the console you use (you still didn't say which one whether it is default chrome console or some kind of addon) and there you'll see it in the communication. – jPO Jan 27 '15 at 15:49
  • I use chrome with default "inspect" features. – ZedBrannigan Jan 27 '15 at 15:58
  • how did you solve this. http://stackoverflow.com/questions/33842318/can-not-get-web-element-in-selenium i have also same error, i tried toget in java eclipse but for android i dont know. –  Dec 01 '15 at 18:16
  • You say you can get this element with commands. But when i try to get all page source, it says enable js. Without seeing page, how can it take the element. It cant take when i try –  Dec 01 '15 at 19:23

1 Answers1

1

So as an answer look at this code.

$.ajax({
  url:"http://www.kayak.de/s/run/inlineDetails/flight",
  type:"post",
  dataType:"json",
  data:{
    localidx:104,
    resultid:"ea63237698f20885f47a9b3827f9953b",
    searchid:"kUECCUIWzU"
  }
});

You can see there is a server waiting for your request. Trying to understand the parameters sent to the server gave me few options, I'm going to stick to these. localidx is the id written in the button Details anzeigen resultid is also written inside that button searchid is written in an iframe with the id master-1 in the name tag

I hope you are a friend with regexes because that's what you have to face :/

Good luck! Hope I helped!

jPO
  • 2,502
  • 1
  • 17
  • 26
  • Thank you so much for your help. Since I barely understand regex and whats coming up as next steps, I feel this task is too big for me to tackle :/ I REALLY apreciate your help! – ZedBrannigan Jan 27 '15 at 15:46
  • http://stackoverflow.com/questions/33842318/can-not-get-web-element-in-selenium for android, what can i use? i used phantomjs for eclipse. –  Dec 01 '15 at 18:16