4

How can I check my session cookies and specify those cookies before making a subsequent web request?

I want to scrape a page but I cannot submit the cookies.

I'm using the rvest library.

My code:

library(rvest)
WP <- html_session("http://www.wp.pl/")
headers <- httr::headers(WP)
cookies <- unlist(headers[names(headers) == "set-cookie"])
crumbs <- stringr::str_split_fixed(cookies, "; ", 4)
# method 1
stringr::str_split_fixed(crumbs[, 1], "=", 2)
# method 2
cookies(WP)

How do I set my cookies to do the web scraping?

Jim G.
  • 15,141
  • 22
  • 103
  • 166
AgnieszkaTomczyk
  • 253
  • 2
  • 12

1 Answers1

2
  1. Keep in mind that rvest is built on top of the httr library.
  2. For some reason that I can't explain, this code didn't work until I rebooted RStudio.

Here's some code that'll do the trick:

library(httr)
library(rvest)

httr::GET("http://www.wp.pl/", 
    set_cookies(`_SMIDA` = "7cf9ea4bfadb60bbd0950e2f8f4c279d",
                `__utma` = "29983421.138599299.1413649536.1413649536.1413649536.1",
                `__utmb` = "29983421.5.10.1413649536",
                `__utmc` = "29983421",
                `__utmt` = "1",
                `__utmz` = "29983421.1413649536.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)")) %>%
    read_html %>%  # Sample rvest code
    read_table(fill=TRUE) # Sample rvest code
Jim G.
  • 15,141
  • 22
  • 103
  • 166