0

I'm trying to get all items in a specific div that is already addressed.

But after using children() it only shows the first item instead of a list of whole children.

Here is the source:

enter image description here

And here is my code:

> const html = res.data;
> const $ = cheerio.load(html);
> const data = $('div[id="thepics"]');
> console.log(data);

And its result:

LoadedCheerio {
  '0': <ref *1> Node {
    type: 'tag',
    name: 'div',
    namespace: 'http://www.w3.org/1999/xhtml',
    attribs: [Object: null prototype] {
      style: 'margin:-5px 0px 10px 0px',
      id: 'thepics'
    },
    'x-attribsNamespace': [Object: null prototype] { style: undefined, id: undefined },
    'x-attribsPrefix': [Object: null prototype] { style: undefined, id: undefined },
    children: [ [Node] ],
    parent: Node {
      type: 'tag',
      name: 'div',
      namespace: 'http://www.w3.org/1999/xhtml',
      attribs: [Object: null prototype],
      'x-attribsNamespace': [Object: null prototype],
      'x-attribsPrefix': [Object: null prototype],
      children: [Array],
      parent: [Node],
      prev: [Node],
      next: [Node]
    },
    prev: Node {
      type: 'tag',
      name: 'div',
      namespace: 'http://www.w3.org/1999/xhtml',
      attribs: [Object: null prototype],
      'x-attribsNamespace': [Object: null prototype],
      'x-attribsPrefix': [Object: null prototype],
      children: [],
      parent: [Node],
      prev: [Node],
      next: [Circular *1]
    },
    next: Node {
      type: 'tag',
      name: 'div',
      namespace: 'http://www.w3.org/1999/xhtml',
      attribs: [Object: null prototype],
      'x-attribsNamespace': [Object: null prototype],
      'x-attribsPrefix': [Object: null prototype],
      children: [Array],
      parent: [Node],
      prev: [Circular *1],
      next: [Node]
    }
  },
  length: 1,
  options: { xml: false, decodeEntities: true },
  _root: <ref *2> LoadedCheerio {
    '0': Node {
      type: 'root',
      name: 'root',
      parent: null,
      prev: null,
      next: null,
      children: [Array],
      'x-mode': 'no-quirks'
    },
    length: 1,
    options: { xml: false, decodeEntities: true },
    _root: [Circular *2]
  },
  prevObject: <ref *2> LoadedCheerio {
    '0': Node {
      type: 'root',
      name: 'root',
      parent: null,
      prev: null,
      next: null,
      children: [Array],
      'x-mode': 'no-quirks'
    },
    length: 1,
    options: { xml: false, decodeEntities: true },
    _root: [Circular *2]
  }
}

And after try to log data.children() the result is:

LoadedCheerio {
  '0': Node {
    type: 'script',
    name: 'script',
    namespace: 'http://www.w3.org/1999/xhtml',
    attribs: [Object: null prototype] {},
    'x-attribsNamespace': [Object: null prototype] {},
    'x-attribsPrefix': [Object: null prototype] {},
    children: [ [Node] ],
    parent: Node {
      type: 'tag',
      name: 'div',
      namespace: 'http://www.w3.org/1999/xhtml',
      attribs: [Object: null prototype],
      'x-attribsNamespace': [Object: null prototype],
      'x-attribsPrefix': [Object: null prototype],
      children: [Array],
      parent: [Node],
      prev: [Node],
      next: [Node]
    },
    prev: null,
    next: null
  },
  length: 1,
  options: { xml: false, decodeEntities: true },
  _root: <ref *1> LoadedCheerio {
    '0': Node {
      type: 'root',
      name: 'root',
      parent: null,
      prev: null,
      next: null,
      children: [Array],
      'x-mode': 'no-quirks'
    },
    length: 1,
    options: { xml: false, decodeEntities: true },
    _root: [Circular *1]
  },
  prevObject: LoadedCheerio {
    '0': Node {
      type: 'tag',
      name: 'div',
      namespace: 'http://www.w3.org/1999/xhtml',
      attribs: [Object: null prototype],
      'x-attribsNamespace': [Object: null prototype],
      'x-attribsPrefix': [Object: null prototype],
      children: [Array],
      parent: [Node],
      prev: [Node],
      next: [Node]
    },
    length: 1,
    options: { xml: false, decodeEntities: true },
    _root: <ref *1> LoadedCheerio {
      '0': [Node],
      length: 1,
      options: [Object],
      _root: [Circular *1]
    },
    prevObject: <ref *1> LoadedCheerio {
      '0': [Node],
      length: 1,
      options: [Object],
      _root: [Circular *1]
    }
  }
}
Saman
  • 503
  • 4
  • 18
  • Please share the site you're scraping. The "source" you've shown appears to be the browser's element inspector, which is something else. Those elements may well have been injected with JS after the page load, in which case cheerio wouldn't see them in the static HTML you've retrieved. – ggorlen Dec 18 '21 at 18:14
  • `$('div[id="thepics"] div.pic')` -- that is how you would get the children divs. – JM-AGMS Dec 20 '21 at 17:07
  • @ggorlen yes it uses JS after the page load. In this case, what is the best solution instead of cheerio? If cheerio cannot read it anymore. – Saman Jan 18 '22 at 12:40
  • It depends on the site. Sometimes you can use their API, other times there's JSON baked into the static HTML as a script, sometimes you need Selenium/Puppeteer. See [How can I scrape pages with dynamic content using node.js?](https://stackoverflow.com/questions/28739098/how-can-i-scrape-pages-with-dynamic-content-using-node-js) – ggorlen Jan 18 '22 at 17:00
  • Voting to close again a year later--there's no [mcve] or site, so there's no value to future visitors or possible answer to be had here. – ggorlen Nov 26 '22 at 02:48

0 Answers0