0

I'm currently using Jsoup to try and get the videoID of the Youtube videos after I perform a search. I'm trying the get the videoID from the href and to do that I am using the following code:

val doc = Jsoup.connect("https://www.youtube.com/results")
                .data("search_query", s).get()

          for (a in doc.select("a[href]")) {
              Log.d("MAIN", a.attr("abs:href"))
          }

But currently, the result looks as so: enter image description here

So I thought youtube was giving me a basic response because I didn't have the user agent. So I did that.

I tried adding the following based on this previous StackOverflow question

.ignoreContentType(true)
.userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
.referrer("http://www.google.com")
.timeout(12000)
.followRedirects(true)
.execute().parse()

It still gives me the basic response. I wanted to compare the results of Log.d("MAIN", doc.toString()) but there was only a slight difference in the meta tag and the nonce. And for some reason, I wasn't getting the full string version of the doc so I could not make further comparisons.

How can I get the youtube links after they were searched? (I want to get a link where it has "watch?v=XXXXXXX") If possible I would like solutions in kotlin language.

KahngjoonK
  • 45
  • 7

2 Answers2

2

If you look at raw youtube response (doc), you'll see, that it returns a lot of js code (inside <script> tags). This code contains instructions about building HTML, which you see in your browser. But Jsoup is not a browser emulator - it can't execute javascript code, so there are no a[href] elements you're looking for.

You need to either use another tool or parse javascript. Luckly, simple regex is enough in this case:

val doc = Jsoup.connect("https://www.youtube.com/results").data("search_query", s).get()
val regex = "\"videoId\":\"(\\w+)\"".toRegex()
val videoIds = doc.select("script")
    .map { it.data() }
    .filterNot { it.isEmpty() }
    .flatMap { regex.findAll(it) }
    .map { it.groupValues[1] }
    .distinct()
1

I can't comment on the accepted response but I'd just like to add that some youtube video Ids have "-" in them so you're going to need to change your regex to reflect that:

val regex = "\"videoId\":\"([\\w|-]+)\"".toRegex()
mhkdepauw
  • 11
  • 3