1

I was told data threads can easily be combined with queues,but I have encountered problems. This code should create a program that will serially, or one after the other, grab a URL of a website, and print out the first 512 bytes of the page.

from queue import Queue
from threading import Thread
import urllib.request

hosts = ["http://yahoo.com", "http://google.com", "http://amazon.com","http://ibm.com", "http://apple.com"]

queue = Queue()

class ThreadUrl(Thread):
   def __init__(self, queue):
       Thread.__init__(self)
       self.queue = queue

   def run(self):
      while True:
         host = self.queue.get()
         url=urllib.request.urlopen(host)
         print(url.read(512))
         self.queue.task_done()

def main():
    for i in range(5):
        t = ThreadUrl(queue)
        t.setDaemon(True)
        t.start()

    for host in hosts:
        queue.put(host)

    queue.join()

main()

I got this,problem at thee last thread

b'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="sr"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script>(function(){window.google={kEI:\'hD3FWZiRJ8G2a8GfqdAF\',kEXPI:\'18168,1352613,1352960,1353383,1353747,1354276,1354401,1354625,1354749,1354875,1355174,1355205,1355217,3700315,3700476,4017608,4029815,4031109,4043492,4045841,4048347,4061945,'
b'\n<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US" prefix="og: http://ogp.me/ns#" class="no-js">\n\n<head>\n\t\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<meta charset="utf-8" />\n<link rel="canonical" href="https://www.apple.com/" />\n\n\n\t\n\t<link rel="alternate" href="https://www.apple.com/" hreflang="en-US" /><link rel="alternate" href="https://www.apple.com/ae-ar/" hreflang="ar-AE" /><link rel="alternate" href="https://www.apple.com/ae/" hreflang="en-AE" /><link rel="alternate" href="https://'
b'<!DOCTYPE html>\n<html id="atomic" lang="en-US" class="atomic my3columns  l-out Pos-r https fp fp-v2 rc1 fp-default mini-uh-on viewer-right two-col ntk-wide ltr desktop Desktop bkt201">\n<head>\n    \n    <title>Yahoo</title><meta http-equiv="x-dns-prefetch-control" content="on"><link rel="dns-prefetch" href="//s.yimg.com"><link rel="preconnect" href="//s.yimg.com"><link rel="dns-prefetch" href="//search.yahoo.com"><link rel="preconnect" href="//search.yahoo.com"><link rel="dns-prefetch" href="//y.analytics.yah'
b'<!DOCTYPE html>\n<html lang="en-US">\n<head>\n\t<meta charset="UTF-8">\n\t<meta name="viewport" content="width=device-width, initial-scale=1">\n\t<title>IBM - United States</title>\n\t<link rel="canonical" href="https://www.ibm.com/us-en/"/>\n\t<meta name="robots" content="index,follow">\n\t<meta name="description" content="For more than a century IBM has been dedicated to every client&#x27;s success and to creating innovations that matter for the world">\n\t<meta name="keywords" content="IBM">\n\t<meta name="dcterms.date" c'
Exception in thread Thread-5:
Traceback (most recent call last):
  File "/home/milenko/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "f1.py", line 17, in run
    url=urllib.request.urlopen(host)
  File "/home/milenko/anaconda3/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/home/milenko/anaconda3/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/home/milenko/anaconda3/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/home/milenko/anaconda3/lib/python3.6/urllib/request.py", line 564, in error
    result = self._call_chain(*args)
  File "/home/milenko/anaconda3/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/home/milenko/anaconda3/lib/python3.6/urllib/request.py", line 756, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/home/milenko/anaconda3/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/home/milenko/anaconda3/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/home/milenko/anaconda3/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/home/milenko/anaconda3/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/home/milenko/anaconda3/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable

Why?

MishaVacic
  • 1,812
  • 8
  • 25
  • 29

1 Answers1

2

Like the error says, you're getting a HTTP error; It has nothing to do with threads. The URL you're calling is returning a 503 Service Unavailable error response.

503 SERVICE UNAVAILABLE The server is currently unable to handle the request due to a temporary overload or scheduled maintenance, which will likely be alleviated after some delay.

The server MAY send a Retry-After header field1 to suggest an appropriate amount of time for the client to wait before retrying the request.

Note: The existence of the 503 status code does not imply that a server has to use it when becoming overloaded. Some servers might simply refuse the connection.

Most likely, you're hammering the URL too quickly and you exceeded the throttle limit they have. You can confirm this by checking the response to see if it has a Retry-After header. The message body may also explain what the throttle limit is.

The solution is to slow down your requests to the service. Read their documentation and find out what their throttle limits are, then update your code to stay within those limits.

Soviut
  • 88,194
  • 49
  • 192
  • 260