0

I'm trying to use requests and bs4 to get info from a website, but am receiving the status code 304 and no content from request.get(). I've done some reading and understand that this code indicates the resource is already in my cache. How can I either access the resource from my cache, or preferably, clear my cache so that I can receive the resource new?

I've tried adding the following header: headers={'Cache-Control': 'no-cache'} to requests.get() but still have the same issue.

Additionally I've looked into the requests-cache module, but am unclear on how or if this could be used to solve the problem.

code:

import requests

r = requests.get('https://smsreceivefree.com/')

print(r.status_code)
print(r.content)

output:

304
b''
asheets
  • 770
  • 1
  • 8
  • 28
  • I think they are sending you the response even though (according to the HTTP spec) they shouldn't: 304 is only sent when the request was conditional (i.e. had a `If-Modified-Since` header). Since you aren't sending this, it seems they just reply with that to "botty" user-agents like the one of requests. – L3viathan Sep 01 '18 at 14:48
  • @L3viathan That makes a lot more sense than what I was reading. I changed the user-agent and now everything works! If you post this as an answer, I can accept it – asheets Sep 01 '18 at 14:56

1 Answers1

8

A server should send a 304 Not Modified reply if the client sent a conditional request, like one having an If-Modified-Since header. This makes sense if the client already has a cached version of the page, and wants to avoid downloading the content if he already has the newest version of it.

In this case, the website seems to send a 304 to certain kinds of clients, as it seems: ones where the User-Agent seems to indicate automation (which is true, in your case).

The server should instead send a 4xx error code, probably a 403 Forbidden, but likely uses a 304 in order to throw bot writers off the right track and make them come to StackOverflow.

L3viathan
  • 26,748
  • 2
  • 58
  • 81