6

I need to download information from web site that is protected using cookies. I pass this protection manually and then insert cookies to httr.

Here is similar topic, but it does not solve my problem: (Copying cookie for httr)

library(httr)
url<-"http://smida.gov.ua/db/emitent/year/xml/showform/32153/125/templ"

cook<-"_SMIDA=9117a9eb136353bd6956651bd59acd37; __utmt=1; __utma=29983421.1729484844.1413489369.1413625619.1413627797.3; __utmb=29983421.7.10.1413627797; __utmc=29983421; __utmz=29983421.1413489369.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)"

response <- GET(url, config(cookie= cook))

content(x = response,as = 'text', encoding = "UTF-8")   

So when I use content it return me information, that I am not logged in( as I do without cookie)

How can I solve this problem?

Test credentials are login: mytest2, pass: qwerty12)

Community
  • 1
  • 1
VadymB
  • 75
  • 1
  • 4

1 Answers1

6

This would be the way to set_cookies with GET & httr:

GET("http://smida.gov.ua/db/emitent/year/xml/showform/32153/125/templ", 
    set_cookies(`_SMIDA` = "7cf9ea4bfadb60bbd0950e2f8f4c279d",
                `__utma` = "29983421.138599299.1413649536.1413649536.1413649536.1",
                `__utmb` = "29983421.5.10.1413649536",
                `__utmc` = "29983421",
                `__utmt` = "1",
                `__utmz` = "29983421.1413649536.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)"))

That worked for me, well at least I think it did as I cannot read the language. A table comes back with the same structure and no prompt to login.

Unfortunately the captcha on login prevents the use of Rselenium (or other, similar, crawling packages), so you'll have to continue to manually grab those cookies (or use a utility to extract them from the session).

Finally, you probably want to seriously consider changing those credentials, now :-)


EDIT: @VadymB and I both found that the code didn't work until we rebooted RStudio. Your mileage may vary.

Jim G.
  • 15,141
  • 22
  • 103
  • 166
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
  • 1
    thanks, it helped! But it was really strange, this code didn't worked unless i rebooted RStudio =\ – VadymB Oct 18 '14 at 21:08
  • And can you explain to me the next thing: If I run this code 2nd time, it wouldn't work, because the site would reject these cookies. I tried to reset_config() but nothing happens. This is real problem for me because, I'd like to create 5-10 accounts and download data simultaneously – VadymB Oct 18 '14 at 21:57
  • 1
    +1 This, should really be part of the documentation as an example. Because this syntax is not discoverable otherwise. – Brandon Bertelsen Mar 31 '15 at 17:50
  • @VadymB: Same thing for me - This code didn't work until I rebooted RStudio! – Jim G. Mar 31 '18 at 20:44
  • Not specific to this example, but ran into it when trying to set cookies. If I grabbed them from Chrome DevTools, the cookie values were often URL encoded. They need to be passed through `URLdecode` otherwise I think httr tries to re-encode them – Marcus Apr 23 '21 at 19:44