0

I was playing around with urllib.request module when I found that calling read() method on a urllib.request.urlopen() object results in no response the second time:

code:

import urllib.request

url = 'http://www.youtube.com'
resp = urllib.request.urlopen(url)

print(len(resp.read()))  # first call
print(len(resp.read()))  # second call

output:

549444
0

I can't find any documentation regarding read() method, I want to better understand what exactly is happening in the above code. An obvious fix might be to call the urlopen() method again but this would be inefficient.

CaptainDaVinci
  • 975
  • 7
  • 23
  • 1
    Or just assign the result of the first read... – jonrsharpe Jul 31 '17 at 18:52
  • Yes, this will work, but why doesn't read() work the after the first call ? – CaptainDaVinci Jul 31 '17 at 18:55
  • 1
    In respect to [that duplicate target question](https://stackoverflow.com/q/3906137/216074): `read()` is actually a method defined by the base IO interface. The linked question talks about file handles, but it’s really the same for any IO object you can read from. Once you read from a file/stream/urllib response, you are at the end of that file/stream/urllib response, so there is nothing more to read. – poke Jul 31 '17 at 18:55
  • `seek(0)` would move the pointer back to the start position in case of files but this doesn't seem to work in case of urllib respose (unsupported operation) even though the `dir(resp)` shows the method `seek` – CaptainDaVinci Jul 31 '17 at 19:08
  • 1
    Yeah, seeking does not work for `urllib` responses. You will have to read the file in full and then keep it in memory as jonrsharpe suggested if you want to use it multiple times. – poke Jul 31 '17 at 19:12

0 Answers0