0

I'm trying to scrape this page: http://photo.net/nikon-camera-forum/00aoms I'm using the Requests Package in Python however although the page is fine and it loads when I enter the url in a browser I get this error as the output of requests.get.text and I don't know what's the problem:

"photo.net Temporarily Unavailable 
photo.net 
Sun Jul 13 19:26:33 EDT 2014 — photo.net is down temporarily for 
system maintenance. Please visit us again later."
user3821329
  • 317
  • 1
  • 6
  • 14

1 Answers1

2

The site has a simple User-Agent header check, provide it:

>>> import requests
>>> response = requests.get('http://photo.net/nikon-camera-forum/00aoms', headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4)'})
>>> print response.text
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://opengraphprotocol.org/schema/">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<script type="text/javascript">var _sf_startpt=(new Date()).getTime()</script>

<title>D800 wifi options? - Photo.net Nikon Forum</title>
...

FYI, what was without passing the header:

>>> response = requests.get('http://photo.net/nikon-camera-forum/00aoms')
>>> print response.text
<html><head><title>photo.net Temporarily Unavailable</title></head>
<center><h2>photo.net </h2>
<p><i>Sun Jul 13 19:46:33 EDT 2014</i>&nbsp;&mdash; photo.net is down temporarily for 
system maintenance.  Please visit us again later.
</center>
</body>
</html>
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195