1

Using Jsoup, I am able to extract the most websites page source code (right click on webpage and choose "View Page Source"). But for any youtube video page, I am unable to extract page source Its not giving proper page source code. Tried the following coed but failed to extract.

public class App {
  public static void main(String[] args) throws IOException {

    String webUrl = "https://www.youtube.com/watch?v=Zu6o23Pu0Do";
    Document doc = Jsoup.connect(webUrl)
            .userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36")
            .get();

    System.out.println(doc);

 }
}

Anybody can have any advice to fix this???

I am getting the output like the following:

sample output

Funny Boss
  • 328
  • 1
  • 3
  • 12
  • Is the connection timing out? Are you getting an error? – Joe Z Jan 02 '20 at 13:17
  • no. there is not connection timed out. And no error. Just getting unusual data which is not in the original page. – Funny Boss Jan 03 '20 at 15:07
  • I just ran your code in my IDE and it came back with the document. Check out my paste bin. Could you paste all of your code into one as well and append to your question? The image you posted is very hard to ready. - https://pastebin.com/QqY2Lp69 – Joe Z Jan 03 '20 at 15:25
  • i added my full code. and I am getting the output the followings. The url is here - https://pastebin.com/jRkiu3Mt – Funny Boss Jan 03 '20 at 22:06
  • i am also facing the same..i am getting empty title while trying to fetch meta data for youtube pages.. @FunnyBoss – sai prashanth May 11 '20 at 05:19

1 Answers1

1

You're not setting a user agent which could be triggering anti scraping measures by the website. I'm going to assume the problem is your connection is timing out when you're running this. Try to use the following user agent and see if it works for you off of the connect().

.userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36")

Joe Z
  • 349
  • 3
  • 13