4

Youtube autocomplete

As shown in this image I want to retrieve autocomplete search results using Jsoup. I'm already retrieving the video URL, video title and thumbnail using the video id, but I am stuck at retrieving them from the search results.

I have to complete this without using Youtube's Data Api and only using Jsoup.

Any suggestions that can point me in the right direction would be appreciated.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
raj kavadia
  • 926
  • 1
  • 10
  • 30
  • Youtube uses javascript (disable javascript in your browser and check the site) - it cannot be done with jsoup. – TDG Oct 11 '19 at 17:44

1 Answers1

6

The search results are generated dynamically, via JavaScript. That means that they can not be parsed by Jsoup, because Jsoup only "sees" the static code embedded in the page. However, we can get the results directly from the API.

YouTube's autocomplete search results are aquired from a web service (provided by Google). Every time we add a letter in the search bar, in the background, a request is made to that service and the response is rendered on the page. We can discover such APIs with the Developer Tools of a browser. For example, I found this API with the following procedure:

  • Open YouTube in a browser.
  • Open the Developer Console. (Ctrl + Shift + I).
  • Go to the Network tab. Here we can find detailed information about our browser's connections to web-servers.
  • Add a letter in YouTube's search bar. At this point, we can see a new GET request to https://clients1.google.com/complete/search.
  • Click on that request and go to the box on the right, where we can examine the request-response more carefully. In the Headers tab, we see that the URL contains our search query; in the Response tab, the response body contains the autocomplete results.

The response is a JavaScript snippet that contains our data in an array, and it can be parsed with Regular expressions. Jsoup can be used for the HTTP request, but any HTTP client will do.

public static ArrayList<String> autocompleteResults(String query) 
        throws IOException, UnsupportedEncodingException, PatternSyntaxException {
    String url = "https://clients1.google.com/complete/search?client=youtube&hl=en&gs_rn=64&gs_ri=youtube&ds=yt&cp=10&gs_id=b2&q=";
    String re = "\\[\"(.*?)\",";

    Response resp = Jsoup.connect(url + URLEncoder.encode(query, "UTF-8")).execute();
    Matcher match = Pattern.compile(re, Pattern.DOTALL).matcher(resp.body());

    ArrayList<String> data = new ArrayList<String>();
    while (match.find()) {
        data.add(match.group(1));
    }
    return data;
}

The code provided was created and tested on VScode, Java8, Windows, but it should also work on Android Studio.

t.m.adam
  • 15,106
  • 3
  • 32
  • 52
  • I have a question. can we get all the things using this method that you provided with the different URL ? – raj kavadia Oct 12 '19 at 12:00
  • 1
    I'm afraid I don't understand. This function was designed specifically for YouTube autocomplete results and it will not work with any other URL. If you want to write a similar function for other sites, than yes, you would change the URL and the parsing part - most APIs return JSON so you would use `JsonObject`. But note that there is not always an API, most ot the times you have to use an HTML parser. – t.m.adam Oct 12 '19 at 21:11
  • I've come up with a real trouble @t.m.adam. I'll be very glad if you take a look at [this post](https://stackoverflow.com/questions/59045550/cant-parse-the-username-to-make-sure-im-logged-in-to-a-website) to offer any solution. Thanks. – MITHU Nov 26 '19 at 08:15