How to read the header with pycurl

Question

How do I read the response headers returned from a PyCurl request?

score 30 · Accepted Answer · answered Jan 23 '09 at 08:13

30

There are several solutions (by default, they are dropped). Here is an example using the option HEADERFUNCTION which lets you indicate a function to handle them.

Other solutions are options WRITEHEADER (not compatible with WRITEFUNCTION) or setting HEADER to True so that they are transmitted with the body.

#!/usr/bin/python

import pycurl
import sys

class Storage:
    def __init__(self):
        self.contents = ''
        self.line = 0

    def store(self, buf):
        self.line = self.line + 1
        self.contents = "%s%i: %s" % (self.contents, self.line, buf)

    def __str__(self):
        return self.contents

retrieved_body = Storage()
retrieved_headers = Storage()
c = pycurl.Curl()
c.setopt(c.URL, 'http://www.demaziere.fr/eve/')
c.setopt(c.WRITEFUNCTION, retrieved_body.store)
c.setopt(c.HEADERFUNCTION, retrieved_headers.store)
c.perform()
c.close()
print retrieved_headers
print retrieved_body

answered Jan 23 '09 at 08:13

bortzmeyer

34,164
12
67
91

I'd like to use this without having to retrieve the contents. Is there a way to achieve this? My contents is large (1.4GB or similar), and I just need to know the size, not the contents. – Alfe Feb 20 '13 at 15:12
@Alfe try to make ```HEAD``` request instead of ```GET```, something like ```c.setopt(pycurl.CUSTOMREQUEST, "HEAD")``` – Serge Dec 05 '14 at 14:35
Wow, that's a late follow-up, but thank you anyway. But now it's so long ago … Could well be that I did it that way, but actually, I can't remember :-} – Alfe Dec 05 '14 at 20:50
Unfortunately, doing a `HEAD` request will only work if no payload body needs to be sent along (as in a `GET` request but unlike a `POST` or `PUT` request: https://stackoverflow.com/a/4529097/2545732 – Dirk Dec 12 '18 at 16:13

vontrapp · Answer 2 · 2014-02-14T18:13:56.087

17

import pycurl
from StringIO import StringIO

headers = StringIO()

c = pycurl.Curl()
c.setopt(c.URL, url)
c.setopt(c.HEADER, 1)
c.setopt(c.NOBODY, 1) # header only, no body
c.setopt(c.HEADERFUNCTION, headers.write)

c.perform()

print headers.getvalue()

Add any other curl setopts as necessary/desired, such as FOLLOWLOCATION.

edited Feb 14 '14 at 18:13

answered Feb 05 '14 at 01:20

vontrapp

649
6
6

This seems to be the only answer that ONLY fetches the headers. – Mike Furlender May 06 '15 at 19:15
2

I had to change StringIO to BytesIO for python 3.6 – djsumdog Jun 05 '20 at 04:41

score 6 · Answer 3 · answered Sep 01 '11 at 11:54

6

Anothr alternate, human_curl usage: pip human_curl

In [1]: import human_curl as hurl

In [2]: r = hurl.get("http://stackoverflow.com")

In [3]: r.headers
Out[3]: 
{'cache-control': 'public, max-age=45',
 'content-length': '198515',
 'content-type': 'text/html; charset=utf-8',
 'date': 'Thu, 01 Sep 2011 11:53:43 GMT',
 'expires': 'Thu, 01 Sep 2011 11:54:28 GMT',
 'last-modified': 'Thu, 01 Sep 2011 11:53:28 GMT',
 'vary': '*'}

answered Sep 01 '11 at 11:54

Alexandr

381
1
5
13

when importing `human_curl` I am getting error saying `ImportError: pycurl: libcurl link-tiume ssl baqckend (nss) is different from compile time ssl backend (none/other)` – Ciasto piekarz Feb 28 '18 at 03:34

score 1 · Answer 4 · answered Jan 23 '09 at 09:26

1

This might or might not be an alternative for you:

import urllib
headers = urllib.urlopen('http://www.pythonchallenge.com').headers.headers

answered Jan 23 '09 at 09:26

PEZ

16,821
7
45
66

How to read the header with pycurl

4 Answers4

Linked