12

i would like to download all of images from this site but after downloading photos all are corrupted. What i should do to download them successfully?

My code:

library(XML)
dir.create('c:/photos')
urls<-paste("http://thedevilsguard.tumblr.com/page/",1:1870,sep="")
doc<-htmlParse(urls[1])
links<-unique(unlist(xpathApply(doc,'//div[@class="timestamp"]/a',xmlGetAttr,'href')))
for (i in 1:length(links)){
  doc2<-htmlParse(links[i])
  link<-xpathApply(doc2,'//div[@class="centre photopage"]//p//img',xmlGetAttr,'src')[[1]][1]
  download.file(link,paste("C:/photos/",basename(link),""))
}
Maciej
  • 3,255
  • 1
  • 28
  • 43

2 Answers2

20

So it looks you are under Windows. When you download binary files, you have to specify the mode to be binary, e.g.

download.file(link, ..., mode = 'wb')

see ?download.file for details.

Yihui Xie
  • 28,913
  • 23
  • 193
  • 419
3

First, try and download one. Do this:

link = "http://29.media.tumblr.com/tumblr_m0q2g8mhGK1qk6uvyo1_500.png"
download.file(link,basename(link))

Does that work?

I notice its a PNG and NOT a JPEG, so maybe you are trying to read it in as a JPEG.

Spacedman
  • 92,590
  • 12
  • 140
  • 224
  • Yes, of course, i wrote jpg insted of png, sorry, but there's still a problem with downloading images. As i see, there are also png and jpg files there. – Maciej Mar 11 '12 at 14:07
  • That image downloads okay, but viewing it with an image viewer gives me a warning about an unknown Exif (TIFF) type. The image itself (three guys one gun) views fine. What is the nature of the 'corruption' you are having? – Spacedman Mar 11 '12 at 14:41
  • When i download jpg file my irfanview says: "Decode error! JPEG datastream contains no image", for png file: "Decode error! invalid or unsupported png file". Some looks like this: http://postimage.org/image/izrexz9s7/ – Maciej Mar 11 '12 at 14:50
  • But you can view it in your web browser at the same URL okay? What if you view the downloaded image in your web browser instead of irfanview? – Spacedman Mar 11 '12 at 14:53
  • URL is fine when i view it in my web browser, but after downloading file, it's corrupted. I've tried as you said (view image in web browser) but the same problem as in irfanview or windows image viewer. – Maciej Mar 11 '12 at 15:00
  • Okay, I think I've done as much as I can. You might need to find a local expert who can try things on your system. – Spacedman Mar 11 '12 at 15:04