3

I just have a link to a product page, at amazon. How do I get all the information (photo, price etc), in my ruby program, just using this link?

JDB
  • 25,172
  • 5
  • 72
  • 123
user85748
  • 1,213
  • 3
  • 14
  • 21
  • how did you finally solve this ? did you use regular expressions ? url page scraping ? – Jayaram Jun 27 '12 at 19:59
  • Yes, did you ever figure out a better way to do this than trying to parse out the ItemID? – cjn Nov 04 '14 at 02:18

5 Answers5

7

Here's the list of supported urls as disclosed by amazon for their oembed, product advertising API would come to picture only after parsing through these URLs and getting the ASINs

http://*amazon.*/gp/product/*

http://*amazon.*/*/dp/*

http://*amazon.*/dp/*

http://*amazon.*/o/ASIN/*

http://*amazon.*/gp/offer-listing/*

http://*amazon.*/*/ASIN/*

http://*amazon.*/gp/product/images/*

http://*amazon.*/gp/aw/d/*

http://www.amzn.com/*

http://amzn.com/*

JDB
  • 25,172
  • 5
  • 72
  • 123
Sushant Khurana
  • 843
  • 1
  • 10
  • 13
2

I found this library (I'm using Rails) amazon-ecs I'm experimenting with it. Still, I'd require some kind of ID (product id?) to get details of a particular product. For example, consider this link to kindle

http://www.amazon.com/Kindle-Amazons-Wireless-Reading-Generation/dp/B00154JDAI/ref=amb_link_84372271_1?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-1&pf_rd_r=06JJGQP9J3BHKPE38SXP&pf_rd_t=101&pf_rd_p=478184871&pf_rd_i=507846

In that link, I noticed ASIN, which is B00154JDAI.

Looks like I can use this ID, to get product information (using amazon-ecs). I just need to parse the URL, to get ASIN.

Is there any other way to do it?

No, I am not going to do screen scraping, that is not a good idea anytime.

Damon
  • 67,688
  • 20
  • 135
  • 185
user85748
  • 1,213
  • 3
  • 14
  • 21
  • Is there a reason you want another way to do it? Amazon's URLs are reasonably uniform so extracting the ASIN should not generally be a problem and amazon-ecs does provide a pretty simple abstraction. If you have some motivation for needing another way though.. – Peter Cooper May 23 '09 at 18:15
  • I randomly checked some URLs. Found that they have something called ASIN (Amazon Standard Item Number). It appears somewhere in the URL, but not in the same format all the time. Sometimes they have /dp/ASIN, sometimes they have /gp/ASIN and sometimes they have just ASIN. There might be other combinations, I am not sure. Is there any API in amazon-ecs that can get me the ASIN if I pass the URL? – user85748 May 23 '09 at 18:19
  • (Rolled back to original because the edit made the post kind of nonsensical. "Look at this link", referring to GET vars in that link is meaningless when the link is changed to a redirector on SO which does not contain the GET vars at all.) – Damon Mar 13 '13 at 10:10
1

If you want to do this, the Nokogiri or hpricot libraries both allow HTML parsing and searching. However, this kind of screen-scraping is notoriously unreliable (as it may break any time Amazon decides to reorganize their HTML), so if you're planning to do this sort of thing for any length of time I'd recommend leveraging the Amazon Product Advertising API instead.

Greg Campbell
  • 15,182
  • 3
  • 44
  • 45
0

In your program: fetch the page and parse HTML. Filter out the required information. There may be some libraries in Ruby (that I am unaware of), which parse HTML.

hpricot seems to do what you want.

Alan Haggai Alavi
  • 72,802
  • 19
  • 102
  • 127
0

You should use the library Ruby/AWS (google for it, my karma is not high enough to allow external links...). It has been written exactly for that.

You might need to use the built-in Search to find the item you're looking for. After that, the API gives access to pictures, links and all usable information.

Oct
  • 1,505
  • 13
  • 18