How to Download only the first x bytes of data Python

Question

Situation: The file to be downloaded is a large file (>100MB). It takes quite some time, especially with slow internet connection.

Problem: However, I just need the file header (the first 512 bytes), which will decide if the whole file needs to be downloaded or not.

Question: Is there a way to do download only the first 512 bytes of a file?

Additional information: Currently the download is done using urllib.urlretrieve in Python2.7

I would take `wget` apart and modify it so it stops before the end. — Jean-François Fabre, Jan 15 '18 at 06:37
Are you able to use the HTTP `HEAD` method? That returns only the headers. — Jessie, Jan 15 '18 at 06:39
@user2896976 Those are for the HTTP Headers I believe? I need the file headers, which is in the first 512 bytes in the file. — Timothy Wong, Jan 15 '18 at 06:44
@Jean-FrançoisFabre Would love to do that too but with my skills I think I will get murdered by my teacher before I am done HAHAHA. But thanks though - didn't think of that — Timothy Wong, Jan 15 '18 at 06:45

Niema Moshiri · Accepted Answer · 2018-01-15T08:15:50.670

2

I think curl and head would work better than a Python solution here:

curl https://my.website.com/file.txt | head -c 512 > header.txt

EDIT: Also, if you absolutely must have it in a Python script, you can use subprocess to perform the curl piped to head command execution

EDIT 2: For a fully Python solution: The urlopen function (urllib2.urlopen in Python 2, and urllib.request.urlopen in Python 3) returns a file-like stream that you can use the read function on, which allows you to specify a number of bytes. For example, urllib2.urlopen(my_url).read(512) will return the first 512 bytes of my_url

edited Jan 15 '18 at 08:15

answered Jan 15 '18 at 06:40

Niema Moshiri

909
5
14

Ah yes. The edit was what I needed. But no Python modules can do this? – Timothy Wong Jan 15 '18 at 06:43
3

The `urlopen` function (`urllib2.urlopen` in Python 2, and `urllib.request.urlopen` in Python 3) returns a file-like stream that you can use the `read` function on, which allows you to specify a number of bytes. For example, `urllib2.urlopen(my_url).read(512)` will return the first 512 bytes of `my_url`. However, I'm not certain this will *only* download 512 bytes, or if it will try to download the entire file behind-the-scenes and just return the first 512 – Niema Moshiri Jan 15 '18 at 06:47
the one in the comment works. do you want to replace it and let me accept as answer? – Timothy Wong Jan 15 '18 at 07:02
Might I add on that `urllib` also has the same module. If you choose to lessen the number of libraries you are importing. (I have imported `urllib` and was actually hesitant to import `urllib2`) – Timothy Wong Jan 15 '18 at 15:14

score 0 · Answer 2 · edited Jul 24 '19 at 08:35

If the url you are trying to read responds with Content-Length header, then you can get the file size with urllib2 in Python 2.

def get_file_size(url):
    request = urllib2.Request(url)
    request.get_method = lambda : 'HEAD'
    response = urllib2.urlopen(request)
    length = response.headers.getheader("Content-Length")
    return int(length)

The function can be called to get the length and compared with some threshold value to decide whether to download or not.

if get_file_size("http://stackoverflow.com") < 1000000:
    # Download

(Note that the Python 3 implimentation differs slightly:)

from urllib import request

def get_file_size(url):
    r = request.Request(url)
    r.get_method = lambda : 'HEAD'
    response = request.urlopen(r)
    length = response.getheader("Content-Length")
    return int(length)

Love the idea, but I need to compare its hash values that is the one present in the file header. The file size can be the same but its contents may be different. Therefore the hash value is more reliable as a check than file size. — Timothy Wong, Jan 15 '18 at 06:56

How to Download only the first x bytes of data Python

2 Answers2