I'm trying to take the Netscape HTTP Cookie File that Curl spits out and convert it to a Cookiejar that the Requests library can work with. I have netscapeCookieString
in my Python script as a variable, which looks like:
# Netscape HTTP Cookie File
# https://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.
.miami.edu TRUE / TRUE 0 PS_LASTSITE https://canelink.miami.edu/psc/PUMI2J/
Since I don't want to parse the cookie file myself, I'd like to use cookielib
. Sadly, this means I have to write to disk since cookielib.MozillaCookieJar()
won't take a string as input: it has to take a file.
So I'm using NamedTemporaryFile
(couldn't get SpooledTemporaryFile
to work; again would like to do all of this in memory if possible).
tempCookieFile = tempfile.NamedTemporaryFile()
# now take the contents of the cookie string and put it into this in memory file
# that cookielib will read from. There are a couple quirks though.
for line in netscapeCookieString.splitlines():
# cookielib doesn't know how to handle httpOnly cookies correctly
# so we have to do some pre-processing to make sure they make it into
# the cookielib. Basically just removing the httpOnly prefix which is honestly
# an abuse of the RFC in the first place. note: httpOnly actually refers to
# cookies that javascript can't access, as in only http protocol can
# access them, it has nothing to do with http vs https. it's purely
# to protect against XSS a bit better. These cookies may actually end up
# being the most critical of all cookies in a given set.
# https://stackoverflow.com/a/53384267/2611730
if line.startswith("#HttpOnly_"):
# this is actually how the curl library removes the httpOnly, by doing length
line = line[len("#HttpOnly_"):]
tempCookieFile.write(line)
tempCookieFile.flush()
# another thing that cookielib doesn't handle very well is
# session cookies, which have 0 in the expires param
# so we have to make sure they don't get expired when they're
# read in by cookielib
# https://stackoverflow.com/a/14759698/2611730
print tempCookieFile.read()
cookieJar = cookielib.MozillaCookieJar(tempCookieFile.name)
cookieJar.load(ignore_expires=True)
pprint.pprint(cookieJar)
But here's the kicker, this doesn't work!
print tempCookieFile.read()
prints an empty line.
Thus, pprint.pprint(cookieJar)
prints an empty cookie jar.
I was easily able to reproduce this on my Mac:
>>> import tempfile
>>> tempCookieFile = tempfile.NamedTemporaryFile()
>>> tempCookieFile.write("hey")
>>> tempCookieFile.flush()
>>> print tempCookieFile.read()
>>>
How can I actually write to a NamedTemporaryFile
?