24

I would like to store the cookies from one open-uri call and pass them to the next one. I can't seem to find the right docs for doing this. I'd appreciate it if you could tell me the right way to do this.
NOTES: w3.org is not the actual url, but it's shorter; pretend cookies matter here.

h1 = open("http://www.w3.org/")
h2 = open("http://www.w3.org/People/Berners-Lee/", "Cookie" => h1.FixThisSpot)

Update after 2 nays: While this wasn't intended as rhetorical question I guarantee that it's possible. Update after tumbleweeds: See (the answer), it's possible. Took me a good while, but it works.

dlamblin
  • 43,965
  • 20
  • 101
  • 140
  • 2
    For what you're trying to do I'd recommend using [Mechanize](http://mechanize.rubyforge.org/mechanize/). It's designed for this sort of thing. From its description: "The Mechanize library is used for automating interaction with websites. Mechanize automatically stores and sends cookies, follows redirects, can follow links, and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history." – the Tin Man Jan 08 '11 at 04:44
  • That mechanize link is dead, here's the new one http://mechanize.rubyforge.org/ – MCB Feb 28 '14 at 15:14
  • 1
    Mechanize is now on github: https://github.com/sparklemotion/mechanize – JESii Apr 07 '14 at 15:35

6 Answers6

32

I thought someone would just know, but I guess it's not commonly done with open-uri. Here's the ugly version that neither checks for privacy, expiration, the correct domain, nor the correct path:

h1 = open("http://www.w3.org/")
h2 = open("http://www.w3.org/People/Berners-Lee/",
          "Cookie" => h1.meta['set-cookie'].split('; ',2)[0])

Yes, it works. No it's not pretty, nor fully compliant with recommendations, nor does it handle multiple cookies (as is).

Clearly, HTTP is a very straight-forward protocol, and open-uri lets you at most of it. I guess what I really needed to know was how to get the cookie from the h1 request so that it could be passed to the h2 request (that part I already knew and showed). The surprising thing here is how many people basically felt like answering by telling me not to use open-uri, and only one of those showed how to get a cookie set in one request passed to the next request.

dlamblin
  • 43,965
  • 20
  • 101
  • 140
13

You need to add a "Cookie" header.

I'm not sure if open-uri can do this or not, but it can be done using Net::HTTP.

# Create a new connection object.
conn = Net::HTTP.new(site, port)

# Get the response when we login, to set the cookie.
# body is the encoded arguments to log in.
resp, data = conn.post(login_path, body, {})
cookie = resp.response['set-cookie']

# Headers need to be in a hash.
headers = { "Cookie" => cookie }

# On a get, we don't need a body.
resp, data = conn.get(path, headers)
Matthew Schinckel
  • 35,041
  • 6
  • 86
  • 121
4

Thanks Matthew Schinckel your answer was really useful. Using Net::HTTP I was successful

        # Create a new connection object.
          site = "google.com"
          port = 80
          conn = Net::HTTP.new(site, port)

        # Get the response when we login, to set the cookie.
        # body is the encoded arguments to log in.
          resp, data = conn.post(login_path, body, {})
          cookie = resp.response['set-cookie']

        # Headers need to be in a hash.
          headers = { "Cookie" => cookie }

        # On a get, we don't need a body.
          resp, data = conn.get(path, headers)

          puts resp.body
Shawn Chin
  • 84,080
  • 19
  • 162
  • 191
Amal Kumar S
  • 15,555
  • 19
  • 56
  • 88
2

Depending on what you are trying to accomplish, check out webrat. I know it is usually used for testing, but it can also hit live sites, and it does a lot of the stuff that your web browser would do for you, like store cookies between requests and follow redirects.

builder-7000
  • 7,131
  • 3
  • 19
  • 43
John F. Miller
  • 26,961
  • 10
  • 71
  • 121
  • 2
    I'd recommend Mechanize instead. It will hit live sites, handle cookies and follow redirects too, and, it was actually designed to do all that. – the Tin Man Jan 08 '11 at 04:43
1

you would have to roll your own cookie support by parsing the meta headers when reading and adding a cookie header when submitting a request if you are using open-uri. Consider using httpclient http://raa.ruby-lang.org/project/httpclient/ or something like mechanize instead http://mechanize.rubyforge.org/ as they have cookie support built in.

ADAM
  • 3,903
  • 4
  • 29
  • 45
  • I'm afraid "no support for cookies" is too strong of a choice of words. I do appreciate the links though. The latter's documentation seems sparse. http://mechanize.rubyforge.org/mechanize/WWW/Mechanize/CookieJar.html – dlamblin Sep 02 '09 at 00:57
  • Ruby's Mechanize is closely based on Perl's WWW::Mechanize, which has some nice docs. The description of cookies in the Perl docs should help figure out how the Ruby version works. It's been a while since I used it, but I think it'll supply a cookie jar and will handle them automatically. You can define your own jar if you want to switch them in or out or store it on disk for later reuse. – the Tin Man Jan 08 '11 at 04:52
0

There is a RFC 2109 and RFC 2965 cookie jar implementation to be found here for does that want standard compliant cookie handling.

https://github.com/dwaite/cookiejar

Darwin
  • 4,686
  • 2
  • 30
  • 22