The search results are generated dynamically, via JavaScript. That means that they can not be parsed by Jsoup, because Jsoup only "sees" the static code embedded in the page. However, we can get the results directly from the API.
YouTube's autocomplete search results are aquired from a web service (provided by Google). Every time we add a letter in the search bar, in the background, a request is made to that service and the response is rendered on the page. We can discover such APIs with the Developer Tools of a browser. For example, I found this API with the following procedure:
- Open YouTube in a browser.
- Open the Developer Console. (Ctrl + Shift + I).
- Go to the
Network
tab. Here we can find detailed information about our browser's connections to web-servers.
- Add a letter in YouTube's search bar. At this point, we can see a new GET request to
https://clients1.google.com/complete/search
.
- Click on that request and go to the box on the right, where we can examine the request-response more carefully. In the
Headers
tab, we see that the URL contains our search query; in the Response
tab, the response body contains the autocomplete results.
The response is a JavaScript snippet that contains our data in an array, and it can be parsed with Regular expressions. Jsoup can be used for the HTTP request, but any HTTP client will do.
public static ArrayList<String> autocompleteResults(String query)
throws IOException, UnsupportedEncodingException, PatternSyntaxException {
String url = "https://clients1.google.com/complete/search?client=youtube&hl=en&gs_rn=64&gs_ri=youtube&ds=yt&cp=10&gs_id=b2&q=";
String re = "\\[\"(.*?)\",";
Response resp = Jsoup.connect(url + URLEncoder.encode(query, "UTF-8")).execute();
Matcher match = Pattern.compile(re, Pattern.DOTALL).matcher(resp.body());
ArrayList<String> data = new ArrayList<String>();
while (match.find()) {
data.add(match.group(1));
}
return data;
}
The code provided was created and tested on VScode, Java8, Windows, but it should also work on Android Studio.