0

I'm trying to parse one video-resource cinemaonline.kg to grab a link to a videofile. At first I tried to save the opened page in notepad. I looked at it and found:

[a id="onlineplayer" onmouseover="jQuery('a#onlineplayer').fancybox({'width' : '8', 'height' : 430, 'autoScale' : true, 'transitionIn' : 'none', 'transitionOut' : 'none', 'type' : 'iframe' , 'closeClick' : 'false' , 'hideOnOverlayClick':false, 'hideOnContentClick':false});" onclick="window.ui.hitMovie(74);window.ui.setFileDownloaded(74);" class="minibutton" href="http://cinemaonline.kg/pl.php?player=ftp&uid=1953&movieid=74&fileid=74&v=6b576ed87c32f85f9252e80591ca1228">[span]Смотреть[/span][/a]

<> - this chars were changed with this chars - [], because they were not showed.

So I tried to grab it with jsoup. But it returned me nullpointer exception. I looked at returned String of the page, there was no tag [a id = "onlineplayer" ...]. I thought, maybe the page is always being generated by one script:

[a id=\"onlineplayer\" onmouseover=\"jQuery(\'a#onlineplayer\').fancybox({\'width\' : \'8\', \'height\' : 430, \'autoScale\' : true, \'transitionIn\' : \'none\', \'transitionOut\' : \'none\', \'type\' : \'iframe\' , \'closeClick\' : \'false\' , \'hideOnOverlayClick\':false, \'hideOnContentClick\':false});\" onclick=\"window.ui.hitMovie(${movie.movie_id});window.ui.setFileDownloaded(${file.file_id});\" class=\"minibutton\" href=\"${file.links.license|escape}\"][span]Смотреть[/span][/a]

than I tried to parse it with HtmlUnit:

String url = "http://cinemaonline.kg/#/movie/id/74";
WebClient webClient = new WebClient();
webClient.setJavaScriptEnabled(true);
HtmlPage page = null;
webClient.setThrowExceptionOnFailingStatusCode(false);
webClient.setThrowExceptionOnScriptError(false);
try {
    page = webClient.getPage(url);
} catch (FailingHttpStatusCodeException e1) {
    e1.printStackTrace();
} catch (MalformedURLException e1) {
    e1.printStackTrace();
} catch (IOException e1) {
    e1.printStackTrace();
}
webClient.waitForBackgroundJavaScript(10000);
webClient.closeAllWindows();
System.out.println(page.asXml());

But it returned me the same text that has been returned by jsoup. I know that the page uses javascript and ajax(?), but I dont really know how it works. How can I return generated text? Please help

Rinomancer
  • 59
  • 1
  • 8

1 Answers1

0

Check this question to grab a image link Jsoup: how to get an image's absolute url?

Similarly you can get the video element and use the yourvideoelement.attr("href") on it to get the link returned back

Community
  • 1
  • 1
ajan
  • 398
  • 1
  • 5
  • 16
  • no, the problem is that I can't find a videoelement from the page whithout to use browser (googlechrome). when I try to grab a html it doesnot contain the videoelement. that is why i suggested the videoelement is always being generated, but only when I open the page with browser – Rinomancer Feb 23 '13 at 06:57