1

When using Diffbot API, do the APIs grab the content that's added via JS after the HTML has been loaded, or does Diffbot only see the immediately available HTML?

Swader
  • 11,387
  • 14
  • 50
  • 84

1 Answers1

1

Yes, Diffbot can see JavaScript generated content.

Diffbot is a visual learning robot, and does more than just parse the source code of the web page at the given URL. It's based on a stripped down session-less version of Chrome, designed to leak memory and be a bad browser on purpose, all with the goal of being as fast as possible. Diffbot renders the requested page in this browser and then visually inspects it to get the components we've known to expect.

It's important to note that the Article API does not execute JavaScript code in individual calls - only in bulk API calls, Crawlbot calls, and as part of the Analyze API (currently in beta). One must contact the Diffbot team on a case by case basis to enable per-domain full-page renders with JS enabled for the Article API.

An additional limitation to keep in mind is that Crawlbot does not harvest links generated by JS. So if links were added to the page via JS, Crawlbot will not see them. Crawlbot will execute JS when called by other APIs.

Regarding Custom API, the preview window in the dev interface will not render JavaScript content, but the content will be available when calling APIs to the given URL. The Diffbot team is working on fixing this disparity.

enter image description here

Source: Diffbot Support

Swader
  • 11,387
  • 14
  • 50
  • 84