2

I'm trying to download Twitter data using the twitteR package.

I keep getting the error message

"Error in function (type, msg, asError = TRUE) : couldn't connect to host"

I believe this is because I'm doing this on my work computer and I need to pass the details of the proxy server.

To test this, I tried an example given in one of the answers to a similar question about Proxy Setting for R.

If I enter:

library("RCurl")
getURL("http://stackoverflow.com")

Then I get the same error message as when I try to use twitteR:

"Error in function (type, msg, asError = TRUE) : couldn't connect to host"

However if I pass the details of my proxy server, then it works no problem:

library("RCurl")
opts <- list(
  proxy         = "123.456.7.89", 
  proxyusername = "tumbledown", 
  proxypassword = "mypassword",
  proxyport     = 8080
)
getURL("http://stackoverflow.com", .opts = opts)

However, I'm having an issue with passing the details of my proxy server to twitteR. I've tried setting it in R's Rprofile.site file using:

http_proxy="http://tumbledown:mypassword@123.456.7.89:8080/"

But it doesn't seem to do anything to solve the problem. Where am I going wrong?

Edit 1: Here's the code I'm trying to run, which now I look at it makes me realise this is probably more of an ROAuth issue:

library("twitteR")
library("ROAuth")
library("RCurl")

Credentials <- OAuthFactory$new(
  consumerKey = "MY_CONSUMER_KEY",
  consumerSecret = "MY_CONSUMER_SECRET",
  requestURL = "https://api.twitter.com/oauth/request_token",
  authURL = "https://api.twitter.com/oauth/authorize",
  accessURL = "https://api.twitter.com/oauth/access_token")


# I have then tried both of the below handshake methods:

# 1
Credentials$handshake()

# 2
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
Credentials$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))

EDIT 2:

The following codes seems to get me part way there. If I set these options then I can begin the handshake process with Twitter (intermittently, it still fails sometimes).

options(RCurlOptions = list(
    verbose = TRUE,
    proxy ="http://123.456.7.89:8080",
    proxyuserpwd="tumbledown:mypassword",
    proxyauth="ntlm"))

I then get asked to enter a pin from Twitter after following a URL (which I have to laboriously type in as for some reason it won't let me copy/paste it). I then seem to get part way through the handshake before it fails to complete. here's the verbose output (some details removed/altered):

* About to connect() to proxy 123.456.7.89 port 8080 (#0)
*   Trying 123.456.7.89... * connected
* Connected to 123.456.7.89 (123.456.7.89) port 8080 (#0)
* Establish HTTP proxy tunnel to api.twitter.com:443
> CONNECT api.twitter.com:443 HTTP/1.1
Host: api.twitter.com:443
Proxy-Connection: Keep-Alive

< HTTP/1.1 407 Proxy Authentication Required ( Forefront TMG requires authorization to fulfill the request. Access to the Web Proxy filter is denied.  )
< Via: 1.1 ORG-TMG1
< Proxy-Authenticate: Negotiate
< Proxy-Authenticate: Kerberos
< Proxy-Authenticate: NTLM
< Connection: close
< Proxy-Connection: close
< Pragma: no-cache
< Cache-Control: no-cache
< Content-Type: text/html
< Content-Length: 719   
< 
* Ignore 719 bytes of response-body
* Received HTTP code 407 from proxy after CONNECT
* About to connect() to proxy 123.456.7.89 port 8080 (#0)
*   Trying 123.456.7.89... * connected
* Connected to 123.456.7.89 (123.456.7.89) port 8080 (#0)
* Establish HTTP proxy tunnel to api.twitter.com:443
* Proxy auth using NTLM with user 'ORG\tumbledown'
> CONNECT api.twitter.com:443 HTTP/1.1
Host: api.twitter.com:443
Proxy-Authorization: NTLM <LOTS OF RANDOM LETTERS>==
Proxy-Connection: Keep-Alive

< HTTP/1.1 407 Proxy Authentication Required ( Access is denied.  )
< Via: 1.1 ORG-TMG1
< Proxy-Authenticate: NTLM <LOTS OF RANDOM LETTERS>==
< Connection: Keep-Alive
< Proxy-Connection: Keep-Alive
< Pragma: no-cache
< Cache-Control: no-cache
< Content-Type: text/html
< Content-Length: 0     
< 
* Establish HTTP proxy tunnel to api.twitter.com:443
* Proxy auth using NTLM with user 'ORG\tumbledown'
> CONNECT api.twitter.com:443 HTTP/1.1
Host: api.twitter.com:443
Proxy-Authorization: NTLM <LOTS OF RANDOM LETTERS>=
Proxy-Connection: Keep-Alive

< HTTP/1.1 200 Connection established
< Via: 1.1 ORG-TMG1
< Connection: Keep-Alive
< Proxy-Connection: Keep-Alive
< 
* Proxy replied OK to CONNECT request
* successfully set certificate verify locations:
*   CAfile: \\ORG-nas/tumbledown/R/win-library/2.15/RCurl/CurlSSL/cacert.pem
  CApath: none
* SSL connection using RC4-SHA
* Server certificate:
*    subject: C=US; ST=California; L=San Francisco; O=Twitter, Inc.; OU=Twitter Security; CN=api.twitter.com
*    start date: 2013-04-08 00:00:00 GMT
*    expire date: 2013-12-31 23:59:59 GMT
*    subjectAltName: api.twitter.com matched
*    issuer: C=US; O=VeriSign, Inc.; OU=VeriSign Trust Network; OU=Terms of use at https://www.verisign.com/rpa (c)09; CN=VeriSign Class 3 Secure Server CA - G2
*    SSL certificate verify ok.
> POST /oauth/access_token HTTP/1.1
Host: api.twitter.com
Accept: */*
Content-Length: 297
Content-Type: application/x-www-form-urlencoded

< HTTP/1.1 200 OK
< cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0
< content-length: 160
< content-type: text/html; charset=utf-8
< date: Tue, 23 Apr 2013 11:47:21 GMT
< etag: "<LOTS OF RANDOM LETTERS>"
< expires: Tue, 31 Mar 1981 05:00:00 GMT
< last-modified: Tue, 23 Apr 2013 11:47:21 GMT
< pragma: no-cache
< server: tfe
< set-cookie: _twitter_sess=<LOTS OF RANDOM LETTERS>--<LOTS OF RANDOM LETTERS>; domain=.twitter.com; path=/; HttpOnly
< set-cookie: guest_id=<LOTS OF RANDOM LETTERS>; Domain=.twitter.com; Path=/; Expires=Thu, 23-Apr-2015 11:47:21 UTC
< status: 200 OK
< strict-transport-security: max-age=123456789
< vary: Accept-Encoding
< x-frame-options: SAMEORIGIN
< x-mid: <LOTS OF RANDOM LETTERS>
< x-runtime: 0.04538
< x-transaction: <LOTS OF RANDOM LETTERS>
< x-xss-protection: 1; mode=block
< 
* Connection #0 to host 123.456.7.89 left intact
Error: Proxy Authentication Required ( Forefront TMG requires authorization to fulfill the request. Access to the Web Proxy filter is denied.  )
Community
  • 1
  • 1
Tumbledown
  • 1,887
  • 5
  • 21
  • 33
  • What platform are you on? Have you tried starting R from the command line and using `"path_to_R\bin\x64\Rgui.exe" http_proxy=http://tumbledown:mypassword@123.456.7.89:8080/` Info from the [**R for Windows FAQ**](http://cran.r-project.org/bin/windows/base/rw-FAQ.html#The-Internet-download-functions-fail_002e) – Simon O'Hanlon Apr 19 '13 at 11:37
  • Windows and running RStudio. I've tried your suggestion which gives an interesting result. getURL now works without having to pass the opts list of proxy details. But twitterR still won't handshake and returns the "couldn't connect to host" error. – Tumbledown Apr 19 '13 at 12:21
  • I guess the next step is to add the actual code you are using to try and connect to the twitter API. Don't forget to blank out your `consumerkey` and `consumersecret`! – Simon O'Hanlon Apr 19 '13 at 16:16
  • Have added the code as an edit to the main question. – Tumbledown Apr 22 '13 at 07:58
  • Great! I'll have a look when I get into work and am on my Windows box. It's usually something small and troublesome. BTW - have you tried it outside of R studio? i.e. on the RGui? – Simon O'Hanlon Apr 22 '13 at 08:06
  • Much appreciated! Yeah I've tried RGui too, same error message so at least it's consistent I guess! – Tumbledown Apr 22 '13 at 08:52

0 Answers0