20

I'm trying to program a simple web-crawler using the Requests module, and I would like to know how to disable its -default- keep-alive feauture.

I tried using:

s = requests.session()
s.config['keep_alive'] = False

However, I get an error stating that session object has no attribute 'config', I think it was changed with the new version, but i cannot seem to find how to do it in the official documentation.

The truth is when I run the crawler on a specific website, it only gets five pages at most, and then keeps looping around infinitely, so I thought it has something to do with the keep-alive feature!

PS: is Requests a good module for a web-crawler? is there something more adapted?

Thank you !

Elrond
  • 901
  • 9
  • 23
Acemad
  • 3,241
  • 3
  • 23
  • 29
  • 1
    This was [changed in 1.x](http://docs.python-requests.org/en/latest/api/#migrating-to-1-x) – Elrond May 06 '14 at 19:31

3 Answers3

21

This works

s = requests.session()
s.keep_alive = False

Answered in the comments of a similar question.

Community
  • 1
  • 1
nfazzio
  • 498
  • 1
  • 3
  • 12
  • 1
    As far as a web crawler recommendation - SO should not be used for opinion based questions and answers. If you're interested in interacting with the web and web content, I would recommend doing some research on packages like scrapy, and beautifulsoup. – nfazzio Jan 09 '14 at 00:40
  • 6
    at least on current requests version it doesn't work - requests stll sends keep-alive header – MacHala Sep 04 '17 at 11:01
  • 2
    doesnt work. requests:2.21.0. AttributeError: 'Session' object has no attribute 'keep_alive' – Shorin Mar 29 '19 at 04:18
9

I am not sure but can you try passing {"Connection": "close"} as HTTP headers when sending a GET request using requests. This will close the connection as soon a server returns a response.

>>> headers = {"Connection": "close"}
>>> r = requests.get('https://example.xcom', headers=headers)
praveen
  • 3,193
  • 2
  • 26
  • 30
  • 1
    I tried this, but aren't you supposed to use a POST request for that ? anyway the problem still persists ! – Acemad Jan 09 '14 at 00:11
5

As @praveen suggested it's expected from us to use HTTP/1.1 header Connection: close to notify the server that the connection should be closed after completion of the response.

Here is how it's described in RFC 2616:

HTTP/1.1 defines the "close" connection option for the sender to signal that the connection will be closed after completion of the response. For example,

Connection: close

in either the request or the response header fields indicates that the connection SHOULD NOT be considered `persistent' (section 8.1) after the current request/response is complete.

HTTP/1.1 applications that do not support persistent connections MUST include the "close" connection option in every message.

Ilya Khadykin
  • 290
  • 6
  • 14