I am using Diffbot's article API for scraping the articles from any site. Currently I am getting articles with single image, but I want to scrape all the images for the particular article. Any suggestion will be appreciated.
Asked
Active
Viewed 231 times
1 Answers
2
The Article API should, by default, grab all the images in an article. Here's what I get in the "images" array when I run the Article API on this post:
"images": [
{
"pixelHeight": 106,
"diffbotUri": "image|3|-317133287",
"primary": true,
"pixelWidth": 474,
"url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897265phpstormlogo.jpg"
},
{
"pixelHeight": 375,
"diffbotUri": "image|3|-2098856075",
"pixelWidth": 500,
"url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897372Spear_point_knife_blade.jpg"
},
{
"pixelHeight": 525,
"diffbotUri": "image|3|-878345903",
"pixelWidth": 700,
"url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897486CXM-Framework.jpg"
},
{
"pixelHeight": 375,
"diffbotUri": "image|3|-1729707743",
"pixelWidth": 500,
"url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897666Fotolia_57724999_Subscription_Monthly_S.jpg"
},
{
"pixelHeight": 360,
"diffbotUri": "image|3|805836010",
"pixelWidth": 320,
"url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897716cordova_bot.png"
}
],
If you're not getting the same results for a URL, you can always define a custom ruleset that'll grab them. I wrote some tutorials on extracting repeated data here, and there are some hints here, too.
Can you give us the URL of the article that makes the API fail to return all images? Maybe we can solve the problem together by looking at the source of the issue.

Swader
- 11,387
- 14
- 50
- 84
-
1Thanks, a lot @Swader for awesome tutorials – abdulbarik Nov 19 '14 at 06:59
-
@abarik you're welcome! Have a look at sitepoint.com/tag/diffbot for more – Swader Nov 19 '14 at 07:14