0

For example:

if(url.exists("http://www.google.com")) {
    # Two ways to submit a query to google. Searching for RCurl
    getURL("http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=RCurl&btnG=Search")
    # Here we let getForm do the hard work of combining the names and values.
    getForm("http://www.google.com/search", hl="en", lr="",ie="ISO-8859-1", q="RCurl", btnG="Search")
    # And here if we already have the parameters as a list/vector.
    getForm("http://www.google.com/search", .params = c(hl="en", lr="", ie="ISO-8859-1", q="RCurl", btnG="Search"))
}

This is an example from RCurl package manual. However, it does not work:

> url.exists("http://www.google.com")
[1] FALSE

I found there is an answer to this here Rcurl: url.exists returns false when url does exists. It said this is because of the default user agent is not useful. But I do not understand what user agent is and how to use it.

Also, this error happened when I worked in my company. I tried the same code at home, and it worked find. So I am guessing this is because of proxy. Or there is some other reasons that I did not realize.

I need to use RCurl to search my queries from Google, and then extract the information such as title and descriptions from the website. In this case, how to use user agent? Or, does the package httr can do this?

Community
  • 1
  • 1
Feng Chen
  • 2,139
  • 4
  • 33
  • 62
  • I'm not able to reproduce this problem. `url.exists("http://www.google.com") [1] TRUE` Your problem is unrelated to the user agent. That problem was specific to the server in that question and certainly does not apply to Google.com. When you are on the computer that cannot reach Google via R are you on a company network? You probably need to route R's requests through a company proxy or VPN. – Hack-R Nov 02 '16 at 22:45
  • Yeah, This is what I guess. I think I cannot use this because I use a company network. So Could you please tell me how to do this through a company proxy or VPN? I know there is a parameter called proxy in RCurl. I just do not know how to set up. – Feng Chen Nov 02 '16 at 22:55
  • That will depend on your operating system, if it's proxy or VPN, and on the specific configuration. It is probably easiest to contact your company's IT helpdesk if you don't know offhand. You *might* be able to use that option, but it would probably be better and may be necessary to proxy/tunnel all of R or RStudio rather than just this one function. – Hack-R Nov 02 '16 at 23:00
  • Have you tried returning the header file by specifying `.header = TRUE`? – seasmith Nov 02 '16 at 23:02

1 Answers1

0

guys. Thanks a lot for help. I think I just figured out how to do it. The important thing is proxy. If I use:

> opts <- list(
     proxy         = "http://*******",
     proxyusername = "*****", 
     proxypassword = "*****", 
     proxyport     = 8080
)
> url.exists("http://www.google.com",.opts = opts)
[1] TRUE

Then all done! You can find your proxy under System-->proxy if you use win 10. At the same time:

 > site <- getForm("http://www.google.com.au", hl="en",
                 lr="", q="r-project", btnG="Search",.opts = opts)
 > htmlTreeParse(site)
 $file
 [1] "<buffer>"
 .........

In getForm, opts needs to be put in as well. There are two posters here (RCurl default proxy settings and Proxy setting for R) answering the same question. I have not tried how to extract information from here.

Community
  • 1
  • 1
Feng Chen
  • 2,139
  • 4
  • 33
  • 62