Consider this simple RCurl
function to report download progress:
library(RCurl)
curlDown=function(url, follow=TRUE){
x=getURL(url, followlocation=follow, noprogress = FALSE,
progressfunction=function(down,up) cat(down, '\n'))
}
Note that with followlocation=TRUE
(default) we accept to follow the possible redirect location that the server sends as part of the HTTP header.
We get:
curlDown("http://www.example.com")
# 0 0
# 1270 1079
# 1270 1127
# 1270 1270
# 1270 1270
# 1270 1270
As you can see the down
variable passed to the callback by RCurl
is a numeric vector, where the first element is the total download in bytes and the second is the running download size. Due to space constraints, I don't show this here, but upon separate inspection I saw the former is equivalent to the Content-Length
field in the response header.
Not every server gives the Content-Length
field in the response header:
curlDown("http://www.google.it")
# 0 0
# 0 603
# ... blah blah
# 0 44848
# 0 44848
In this case RCurl
sets the missing total value to zero (would NA
have been better?).
Main Google domain, ".com" redirects to a country specific domain, for example ".it" if you are querying from the country associated with this domain (Italy). If you are physical located in the '.it'-domain, you get:
curlDown("http://www.google.com")
# 0 0
# 274 274
# 274 274
# 274 274
# 274 0
# 274 0
# 274 603
# ... blah blah
# 274 44896
# 274 44896
These results are strange. If you compare the running download values with the previous curlDown("http://www.google.it")
, you understand that after the redirect, the values are the same, as you expected; but the total is smaller than the running download!
To understand the problem we do not follow the redirect location:
curlDown("http://www.google.com", follow=FALSE)
# 0 0
# 274 274
# 274 274
# 274 274
The main domain server .com
sends the Content-Length
, 274 bytes, while the redirected server does not (see the zero's in curlDown("http://www.google.it"
).
The problem is that, after redirection, RCurl
does not update the value for the total download size (to zero for the case of unknown size), which remains stacked to the wrong value of 274 bytes.
Is this a BUG or am I missing something?