1

I am having problems scraping a google search page. I can scrape a normal web page but the same code does not work on a search page

I am using nodejs with cheerio api. The search page is for the "restaurants near me". Here is the code

var request = require('request');
var cheerio = require('cheerio');

request('https://www.google.com/search?rlz=1C1CHBF_enUS795US795&q=restaurants+near+me&npsic=0&rflfq=1&rlha=0&rllag=43556429,-83953443,247&tbm=lcl&ved=2ahUKEwikkKDVhvPcAhUEmuAKHekOBl4QjGp6BAgDEE8&tbs=lrf:!2m1!1e2!2m1!1e5!2m1!1e1!2m1!1e3!3sIAE,lf:1,lf_ui:9&rldoc=1#rlfi=hd:;si:;mv:!1m3!1d37687.821017182374!2d-83.91493835!3d43.58986!2m3!1f0!2f0!3f0!3m2!1i324!2i313!4f13.1;tbs:lrf:!2m1!1e2!2m1!1e5!2m1!1e1!2m1!1e3!3sIAE,lf:1,lf_ui:9',(error,response,html)=>{
  if(!error && response.statusCode==200){
    const $= cheerio.load(html)

    $(".dbg0pd").each(function(i, element){
      var a = $(this);
      console.log('This is from inside the loop')
      console.log(a.text());
    });

  }
})

In the web page, running inspect element shows that the class dbg0pd contains the div element which contains the name of the restaurant. However the .each() jquery loop does not even fire as my test console.log statement does not print This is from inside the loop

Running node scrape.js does not print anything.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • I tried manually pulling the page from Google and then doing your cheerio logic, and could not replicate your problem. Make sure you're not getting an error or non-200 response. – Paul Aug 17 '18 at 03:42
  • If I do `console.log($)` it will display the cheerio object. I do not have any response error. Did you use the same url as me? – green coder Aug 17 '18 at 04:27
  • Yes, copied from your question, but I pulled it through the browser, which in the past Google has blocked scraper type software to its browser friendly URLs. It does offer a search API that alleviates the need for scraping, though. – Paul Aug 17 '18 at 11:42
  • I changed the search parameter to search all the div elements instead of that class.I feel like I cannot use cheerio on google search page because when I tried to scrape a url that was displaying restaurants using the code above it returns divs like Images, Vidoes, Maps etc but instead of the resturants I got this at the end _did not match any documents. Reset search toolsSuggestions:Make sure all words are spelled correctly.Try different keywords.Try more general keywords.Try fewer keywords_ – green coder Aug 17 '18 at 21:18
  • Do you have any recommendation for a search api? – green coder Aug 17 '18 at 21:22
  • Depending on your needs: https://developers.google.com/custom-search/json-api/v1/overview – Paul Aug 17 '18 at 22:55
  • If you `console.log(html)` you'll see the selector and HTML aren't present. You're probably being detected as a bot. – ggorlen Nov 26 '22 at 01:43
  • Looking at it a few months later, `dbg0pd` doesn't appear in the static HTML and neither do restaurant names, so I think [How can I scrape pages with dynamic content using node.js?](https://stackoverflow.com/questions/28739098/how-can-i-scrape-pages-with-dynamic-content-using-node-js) applies. – ggorlen Jan 02 '23 at 18:15

0 Answers0