I'm trying to scrape a google shopping query for the attributes of search results (html) with Jsoup. Before I tried executing any tasks with the results, I wanted to make sure that I was actually getting the proper html from Jsoup. So I simply added a System.out.println(Document.toString()); in my Asynctask to see what I was working with. As I suspected, the resultant html was not complete. Here is the code I was running followed by its result :
*(The Search Query is hard-coded to "scarf walmart" for testing purposes)
public class fetcher extends AsyncTask<Void, Void, Integer>{
@Override
protected Integer doInBackground(Void... voids) {
try{
Connection.Response response= Jsoup.connect("https://www.google.ca/search?q=scarf+walmart&tbm=shop")
.ignoreContentType(true)
.userAgent("Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36")
.referrer("http://www.google.ca")
.timeout(12000)
.followRedirects(true)
.execute();
doc = response.parse();
} catch (IOException e){
e.printStackTrace();
}
return 1;
}
@Override
protected void onPostExecute(Integer integer) {
System.out.println(doc.toString());
}
}
This gave me some seemingly good results, except that I am only getting some of 20+ of the search results on the first page. I suspect that this may have something to do with my userAgent value, but i'm not sure how I would go about fixing that.
Edit -> Everytime I run the app, I get a different amount of search results showing up in the source code.
So my question is, How do I get all of the google search results (Or at least a consistent number of them) to show up when I fetch the html using Jsoup?
Any help is appreciated!
Update 2: I've experimented with my code and tried commenting out these lines in:
protected Integer doInBackground(Void... voids) {
try{
Connection.Response response= Jsoup.connect("https://www.google.ca/search?q=scarf+walmart&tbm=shop")
.ignoreContentType(true)
.userAgent("Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36")
//.referrer("http://www.google.ca")
//.timeout(100000)
//.followRedirects(true)
.execute();
doc = response.parse();
} catch (IOException e){
e.printStackTrace();
}
return 1;
}
I'm now certainly getting a LOT more results, but it's still skipping some, any ideas?
Results Update: (Out of 40 Results)
Using Javascript Interface: (20 Results)
09:40:11.909 18077-18077/com.painlessshopping.mohamed.findit I/System.out: >$12.97
09:40:11.909 18077-18077/com.painlessshopping.mohamed.findit I/System.out: >$12.97
09:40:11.910 18077-18077/com.painlessshopping.mohamed.findit I/System.out: >$19.97
09:40:11.910 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $12.97
09:40:11.910 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $19.97
09:40:11.910 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $29.97
09:40:11.910 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $29.97
09:40:11.911 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $14.97
09:40:11.911 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $7.97
09:40:11.911 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $7.97
09:40:11.911 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $12.97
09:40:11.911 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $12.97
09:40:11.912 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $12.97
09:40:11.912 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $16.97
09:40:11.912 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $19.97
09:40:11.912 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $16.97
09:40:11.912 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $14.97
09:40:11.913 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $12.97
09:40:11.913 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $14.97
09:40:11.913 18077-18077/com.painlessshopping.mohamed.findit I/System.out: $14.97
Using the above Code: (Ranges from ~20 to 36 Results)
11-20 10:05:23.540 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $12.97 from Walmart.ca
11-20 10:05:23.540 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $12.97 from Walmart.ca
11-20 10:05:23.540 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $12.97 from Walmart.ca
11-20 10:05:23.541 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $19.97 from Walmart.ca
11-20 10:05:23.541 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $16.97 from Walmart.ca
11-20 10:05:23.541 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $29.97 from Walmart.ca
11-20 10:05:23.542 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $19.97 from Walmart.ca
11-20 10:05:23.542 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $19.99 from Walmart.ca
11-20 10:05:23.542 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $29.97 from Walmart.ca
11-20 10:05:23.543 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $19.97 from Walmart.ca
11-20 10:05:23.543 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $16.97 from Walmart.ca
11-20 10:05:23.544 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $39.97 from Walmart.ca
11-20 10:05:23.545 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $14.97 from Walmart.ca
11-20 10:05:23.545 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $12.97 from Walmart.ca
11-20 10:05:23.546 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $7.97 from Walmart.ca
11-20 10:05:23.546 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $14.97 from Walmart.ca
11-20 10:05:23.547 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $29.98 from Walmart.ca
11-20 10:05:23.547 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $7.97 from Walmart.ca
11-20 10:05:23.547 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $14.97 from Walmart.ca
11-20 10:05:23.548 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $12.97 from Walmart.ca
11-20 10:05:23.548 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $7.97 from Walmart.ca
11-20 10:05:23.548 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $7.97 from Walmart.ca
11-20 10:05:23.549 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $7.97 from Walmart.ca
11-20 10:05:23.549 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $9.97 from Walmart.ca
11-20 10:05:23.550 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $12.97 from Walmart.ca
11-20 10:05:23.550 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $12.98 from Walmart.ca
11-20 10:05:23.550 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $22.97 from Walmart.ca
11-20 10:05:23.551 16788-16855/com.painlessshopping.mohamed.findit I/System.out: $6.87 from Etsy - ashton11