0

I wonder if anyone can point me in the right direction.

I have a rather large spreadsheet of product info that needs plugging into a shop. The tricky bit is that the spreadsheet has a link which points to the relevant page on another site which has the products details, and what i need to do is grab that relevant Image and save locally, so I can use later.The reason Why Im thinking down this line is there are 7500 products....

My friend suggested I could maybe use php & filepopen.

The image does have an outer tag ID which I can refer to.

I was thinking of iterating through the spreadsheet this is the type of link I have to work with

http://www.apc.com/resource/include/techspec_index.cfm?base_sku=APCRBC105

the images themselves are called something random, but I figured I could rename them as I grab them to the more relevant SKU number.

  • so iterate through the spreadsheet by SKU number
  • identify the image by the relevant id on the page (I'm assumming it's in the same place on every page)
  • save the image while renaming to the correct SKU number

Any ideas on how I could go about this ? the thought of visiting each page manually and saving the image 7500 times doesn't seem the best way forward!

Thanks for looking

mro
  • 141
  • 2
  • 4
  • 11
  • 1
    *(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon Oct 10 '11 at 10:20
  • 1
    Actually, I don't see any other way forward than what you outlined. – Ben Lee Oct 10 '11 at 10:20
  • But you should be using the larger images at http://www.apc.com/products/moreimages.cfm?partnum=APCRBC105 – Ben Lee Oct 10 '11 at 10:21
  • yeah actually I could do with those larger images, good point! – mro Oct 10 '11 at 10:24

2 Answers2

0

If there aren't any issues regarding copyrighted material, take a look at Google Refine.

You can grab content from websites based on your cell values and use them afterwards to build more complex scenarios. See the screencasts for more info (screencast 3 talks about fetching values via URLs).

Once you have the Image URL's in your spreadsheet, it should be fairly easy to fetch them via curl or similar.

konsolenfreddy
  • 9,551
  • 1
  • 25
  • 36
  • Hi, It's basically a spreadsheet of the products, and the links are off their suppliers website - so this is all good, thanks I will take a lok google refine. – mro Oct 10 '11 at 10:33
0

Rip the base_sku from your links.

APCRBC105

Then use curl to fetch the image page

http://www.apc.com/products/moreimages.cfm?partnum=APCRBC105

Rip the image link with a regex epression on :

    <div align="center">
<img align="center" src="http://www.apcmedia.com/resource/images/500/Front_Left/35531838-5056-9170-D33F24AE47742E6C_pr.jpg" />
</div>

Then use curl again to rip the actual image and save it. That should work..

Paolo_Mulder
  • 1,233
  • 1
  • 16
  • 28