1

I have a python program that periodically checks the weather from weather.yahooapis.com, but it always throws the error: urllib.HTTPError: HTTP Error 404: Not Found on Accelerator. I have tried on two different computers with no luck, as well as changing my DNS settings. I continue to get the error. Here is my code:

#!/usr/bin/python

import time
#from Adafruit_CharLCDPlate import Adafruit_CharLCDPlate
from xml.dom import minidom
import urllib2

#towns, as woeids
towns = [2365345,2366030,2452373]

val = 1
while val == 1:
time.sleep(2)
for i in towns:
    mdata = urllib2.urlopen('http://206.190.43.214/forecastrss?w='+str(i)+'&u=f')
    sdata = minidom.parseString(mdata)
    atm = sdata.getElementsByTagName('yweather:atmosphere')[0]
    current = sdata.getElementsByTagName('yweather:condition')[0]
    humid = atm.attributes['humidity'].value
    tempf = current.attributes['temp'].value
    print(tempf)
    time.sleep(8)

I can successfully access the output of the API through a web browser on the same computers that give me the error.

TheDoctor
  • 1,450
  • 2
  • 17
  • 28
  • When I try this in a browser, I get the exact same "Not Found on Accelerator" 404 error. Are you sure your browser doesn't have this cached? – abarnert Oct 03 '13 at 01:34

1 Answers1

2

The problem is that you're using the IP address 206.190.43.214 rather than the hostname weather.yahooapis.com.

Even though they resolve to the same host (206.190.43.214, obviously), the name that's actually in the URL ends up as the Host: header in the HTTP request. And you can tell that this makes the difference here:

$ curl 'http://206.190.43.214/forecastrss?w=2365345&u=f'
<404 error>
$ curl 'http://weather.yahooapis.com/forecastrss?w=2365345&u=f'
<correct rss>
$ curl 'http://206.190.43.214/forecastrss?w=2365345&u=f' -H 'Host: weather.yahooapis.com'
<correct rss>

If you test the two URLs in your browser, you will see the same thing.


So, in your code, you have two choices. You can use the DNS name instead of the IP address:

mdata = urllib2.urlopen('http://weather.yahooapis.com/forecastrss?w='+str(i)+'&u=f')

… or you can use the IP address and add the Host header manually:

req = urllib2.Request('http://206.190.43.214/forecastrss?w='+str(i)+'&u=f')
req.add_header('Host', 'weather.yahooapis.com')
mdata = urllib2.urlopen(req)

There's least one other problem in your code once you fix this. You can't call minidom.parseString(mdata) when mdata is a urlopen thingy; you either need to call read() on the thingy, or use parse instead of parseString.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • That is strange. Its now working on my laptop, but my Raspberry Pi is throwing the error: `urllib2.URLError: `. i had originally made the code get the hostname with urllib.socket.gethostbyname('weather.yahooapis.com') to work around this (http://stackoverflow.com/questions/18007174/python-urllib2-force-ipv4) – TheDoctor Oct 03 '13 at 01:53
  • @TheDoctor: If you're having (intermittent?) DNS problems, and you want to cache the IP address and use that in the URL instead of the hostname, then you have to use the 3-line version that sets the Host header explicitly, as explained in my answer. (It would be a little simpler with `requests` instead of the stdlib `urllib2`, if three lines bothers you.) – abarnert Oct 03 '13 at 02:03
  • 1
    Meanwhile, the reason it sometimes works is probably something like this: Yahoo has a slew of frontends and a slew of backends, rather than having one computer per IP and one IP per hostname, so they can load balance effectively. If you happen to get routed to the same frontend you just recently hit, it'll remember which backend server you wanted and route you there without checking. Otherwise, it'll have to read the `Host:` header to decide which backend to route you to, and, since you don't have one, you'll get the error. – abarnert Oct 03 '13 at 02:07
  • Raspberry Pi DNS seems to be very glitchy, and i have fixed the problem using the 'Three line solution' – TheDoctor Oct 03 '13 at 02:34