6

I am trying to download pictures from some urls. For some pictures it works fine, but for others I get 403 errors.

For exemple, this one: http://blog.zenika.com/themes/Zenika/img/zenika.gif

This picture access does not require any authentication. You can click yourself on the link and verify that it is available to your browser with a 200 status code.

The following code produces an exception: new java.net.URL(url).openStream(). Same for org.apache.commons.io.FileUtils.copyURLToFile(new java.net.URL(url), tmp) whichs uses the same openStream() metho under the hood.

java.io.IOException: Server returned HTTP response code: 403 for URL: http://blog.zenika.com/themes/Zenika/img/zenika.gif
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626) ~[na:1.7.0_45]
at java.net.URL.openStream(URL.java:1037) ~[na:1.7.0_45]
at services.impl.DefaultStampleServiceComponent$RemoteImgUrlFilter$class.downloadAsTemporaryFile(DefaultStampleServiceComponent.scala:548) [classes/:na]
at services.impl.DefaultStampleServiceComponent$RemoteImgUrlFilter$class.services$impl$DefaultStampleServiceComponent$RemoteImgUrlFilter$$handleImageUrl(DefaultStampleServiceComponent.scala:523) [classes/:na]

I develop with Scala / Play Framework. I tried to use the built-in AsyncHttpClient.

// TODO it could be better to use itetarees on the GET call becase I think AHC load the whole body in memory
WS.url(url).get.flatMap { res =>
  if (res.status >= 200 && res.status < 300) {
    val bodyStream = res.getAHCResponse.getResponseBodyAsStream
    val futureFile = TryUtils.tryToFuture(createTemporaryFile(bodyStream))
    play.api.Logger.info(s"Successfully downloaded file $filename with status code ${res.status}")
    futureFile
  } else {
    Future.failed(new RuntimeException(s"Download of file $filename returned status code ${res.status}"))
  }
} recover {
  case NonFatal(e) => throw new RuntimeException(s"Could not downloadAsTemporaryFile url=$url", e)
}

With this AHC code, it works fine. Can someone explain this behavior and why I got a 403 error with the URL.openStream() method?

Sebastien Lorber
  • 89,644
  • 67
  • 288
  • 419
  • 1
    How much requests do you fire? Just an assumption - maybe you get kicked for exceeding the request limit? – serejja Apr 09 '14 at 09:43
  • @serejja it is just a single request, and I have this problem on different services hosting images. Maybe you can try yourself doing a single `new java.net.URL(url).openStream()` on this url and see by yourself this is not a spam protection – Sebastien Lorber Apr 09 '14 at 10:22
  • Some image hosters trying to avoid bots from downloading images, so if they realize that request isn't sent from browser they just response with 403 status, try to check if ie pure curl request from commandline gives you correct image – biesior Apr 09 '14 at 11:00
  • @biesior The URL in the question works without a problem if I just download the image with `curl`. – Carsten Apr 09 '14 at 11:32
  • @Carsten @biesior the default user agent used by AHC seems to be `NING/1.0`. I don't know which user agent is used by `openStream()` – Sebastien Lorber Apr 09 '14 at 12:27
  • @SebastienLorber openStream() seems to be User-Agent: Java/1.7.0_21\r\n for example – ouertani Apr 09 '14 at 13:27
  • While curl uses curl/7.32.0 as UserAgent. – ouertani Apr 09 '14 at 13:46

2 Answers2

7

As mentioned, some hoster prevent this intrusion using some header like UserAgent :

This doesn't work :

   val urls = """http://blog.zenika.com/themes/Zenika/img/zenika.gif"""
  val url = new URL(urls)
  val urlConnection = url.openConnection() 
  val inputStream = urlConnection.getInputStream()
  val bufferedReader = new BufferedReader(new InputStreamReader(inputStream))

This works :

val urls = """http://blog.zenika.com/themes/Zenika/img/zenika.gif"""
val url = new URL(urls)
val urlConnection = url.openConnection()   
urlConnection.setRequestProperty("User-Agent", """NING/1.0""") 
val inputStream = urlConnection.getInputStream()
val bufferedReader = new BufferedReader(new InputStreamReader(inputStream))
ouertani
  • 347
  • 3
  • 8
0

I have added the "User-Agent", it still didn't work.

Frank
  • 83
  • 8