1

I want to read specific bytes from a remote file using a python module. I am using urllib2. Specific bytes in the sense bytes in the form of Offset,Size. I know we can read X number of bytes from a remote file using urlopen(link).read(X). Is there any way so that I can read data which starts from Offset of length Size.?

def readSpecificBytes(link,Offset,size):
# code to be written
Heisenberg
  • 1,500
  • 3
  • 18
  • 35

2 Answers2

3

This will work with many servers (Apache, etc.), but doesn't always work, esp. not with dynamic content like CGI (*.php, *.cgi, etc.):

import urllib2
def get_part_of_url(link, start_byte, end_byte):
    req = urllib2.Request(link)
    req.add_header('Range', 'bytes=' + str(start_byte) + '-' + str(end_byte))
    resp = urllib2.urlopen(req)
    content = resp.read()

Note that this approach means that the server never has to send and you never download the data you don't need/want, which could save tons of bandwidth if you only want a small amount of data from a large file.

When it doesn't work, just read the first set of bytes before the rest.

See Wikipedia Article on HTTP headers for more details.

Pi Marillion
  • 4,465
  • 1
  • 19
  • 20
2

Unfortunately the file-like object returned by urllib2.urlopen() doesn't actually have a seek() method. You will need to work around this by doing something like this:

def readSpecificBytes(link,Offset,size):
    f = urllib2.urlopen(link)
    if Offset > 0:
        f.read(Offset)
    return f.read(size)
spinlok
  • 3,561
  • 18
  • 27
  • Note that for very large files this will be a fairly expensive method. You'll have to download all of the content up to this point. – Jordan Reiter Apr 16 '14 at 13:35