3

I am using Cheerio and request for web scraping. Below is my code on running which it doesn't give any error but also it doesn't gives me the innerText of div with that class name.

I am a beginner in this technology. So not able to figure out where i am missing something.

request(baseurl, function(err,resp,body) {
  if (!err && resp.statusCode == 200) {     
    var $ = cheerio.load(body);
    $('div.class','#EIGTDNC-d-W EIGTDNC-d-Lb EIGTDNC-d-S EIGTDNC-d-mb EIGTDNC-d-bc').each(function() {
      temp = this.attr('innerText');
      console.log(temp);
    });

    // send the message back to user
  }
  else {
    console.log('error:', err); 
    console.log('statusCode:', resp && resp.statusCode); 
  }
});
//dom closed
Jeroen Heier
  • 3,520
  • 15
  • 31
  • 32
ttripdee
  • 33
  • 1
  • 6
  • `innerText` is not an attribute it is a DOM object property. So it would be `this.innerText`,if it is a dom node, or `this.text()` if it is a cherrio object – Patrick Evans Aug 07 '17 at 18:37
  • And please watch out your line indentation for better readability. – lumio Aug 07 '17 at 18:41
  • @PatrickEvans thanks for pointing it out. But still I am not able to get the div with that class name. What I am rather worried about is whether the complete dom is getting loaded before the execution reaches that statement or not. Because when I print console.log(body) ----> the dom that i get is different from the actual page's DOM ; its more of Headers,Script codes etc. What I have installed is NodeJS,Request,Cheerio and running it through CMD on Windows. I checked scraping examples from web to scrape a div,para etc but none of them seem to be working for me on the specific site URL. – ttripdee Aug 08 '17 at 03:34
  • If the html that you get is different from what you see when actually going to the site, than the site might be getting generated through javascript which just normal ajax requesting the site will not be enough. You will need some type of middleware like phatnomjs or similar method. There are a [few questions on SO that help answer those types of problems](https://stackoverflow.com/search?q=%5Bjavascript%5D+dynamic+web+scraping) – Patrick Evans Aug 08 '17 at 03:52

1 Answers1

1

innerText is not an attribute of this HTML element.

try retrieve the innerText value using an HTMLElement function:
temp = this.text()

WISeAgent
  • 11
  • 2
  • 1
    It not works like innerText because of breaks are preserved. Look there https://github.com/cheeriojs/cheerio/issues/839 – Daniel Aug 11 '20 at 10:29