1

I am using Diffbot's article API for scraping the articles from any site. Currently I am getting articles with single image, but I want to scrape all the images for the particular article. Any suggestion will be appreciated.

abdulbarik
  • 6,101
  • 5
  • 38
  • 59

1 Answers1

2

The Article API should, by default, grab all the images in an article. Here's what I get in the "images" array when I run the Article API on this post:

"images": [
        {
          "pixelHeight": 106,
          "diffbotUri": "image|3|-317133287",
          "primary": true,
          "pixelWidth": 474,
          "url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897265phpstormlogo.jpg"
        },
        {
          "pixelHeight": 375,
          "diffbotUri": "image|3|-2098856075",
          "pixelWidth": 500,
          "url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897372Spear_point_knife_blade.jpg"
        },
        {
          "pixelHeight": 525,
          "diffbotUri": "image|3|-878345903",
          "pixelWidth": 700,
          "url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897486CXM-Framework.jpg"
        },
        {
          "pixelHeight": 375,
          "diffbotUri": "image|3|-1729707743",
          "pixelWidth": 500,
          "url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897666Fotolia_57724999_Subscription_Monthly_S.jpg"
        },
        {
          "pixelHeight": 360,
          "diffbotUri": "image|3|805836010",
          "pixelWidth": 320,
          "url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897716cordova_bot.png"
        }
      ],

If you're not getting the same results for a URL, you can always define a custom ruleset that'll grab them. I wrote some tutorials on extracting repeated data here, and there are some hints here, too.

Can you give us the URL of the article that makes the API fail to return all images? Maybe we can solve the problem together by looking at the source of the issue.

Swader
  • 11,387
  • 14
  • 50
  • 84