8
import re, urllib.request

textfile = open('depth_1.txt','wt')
print('enter the url you would like to crawl')
print('Usage - "http://phocks.org/stumble/creepy/" <-- with the double quotes')
my_url = input()
for i in re.findall(b'''href=["'](.[^"']+)["']''', urllib.request.urlopen(my_url).read(), re.I):
    print(i)
    for ee in re.findall(b'''href=["'](.[^"']+)["']''', urllib.request.urlopen(i).read(), re.I): #this is line 20!
        print(ee)
        textfile.write(ee+'\n')
textfile.close()

After looking around for a solution to my problem, I couldn't find a fix. The error occures in line 20 (AttributeError: 'bytes' object has no attribute 'timeout'). I don't fully understand the error, so I'm looking for an answer and an explanation of what I did wrong. Thanks!

user3709089
  • 81
  • 1
  • 1
  • 2

3 Answers3

7

From the docs for urllib.request.urlopen:

urllib.request.urlopen(url[, data][, timeout])

    Open the URL url, which can be either a string or a Request object.

If urllib.request.urlopen doesn't receive a string, it assumes it is a Request object. You are passing a bytestring which is why it's failing, eg:

>>> a = urllib.request.urlopen('http://www.google.com').read() # success
>>> a = urllib.request.urlopen(b'http://www.google.com').read() # throws same error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 153, in urlopen
    return opener.open(url, data, timeout)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 446, in open
    req.timeout = timeout
AttributeError: 'bytes' object has no attribute 'timeout'

To fix that, convert your bytestring back to a str by decoding it with the appropriate codec:

>>> a = urllib.request.urlopen(b'http://www.google.com'.decode('ASCII')).read()

Or don't use bytestrings in the first place.

Peter Gibson
  • 19,086
  • 7
  • 60
  • 64
2

This errors is caused by you can't use a bytestring as a url, check encoding of your program

Bruce Gai
  • 61
  • 1
  • 1
1

Because it is an attribute error, some code either you wrote or in a library you use attempted to access the timeout property of an object it was passed. In your case you had a bytes object passed, which is probably your problem. You probably pass the wrong object type around somewhere. If your sure the objects you are passing are correct, follow the traceback to see exactly where timeout is called and check if you can tell what object it expects.

TheoretiCAL
  • 19,461
  • 8
  • 43
  • 65