5

I want to download file in multi thread mode and I have following code here:

#!/usr/bin/env python

import httplib


def main():
    url_opt = '/film/0d46e21795209bc18e9530133226cfc3/7f_Naruto.Uragannie.Hroniki.001.seriya.a1.20.06.13.mp4'

    headers = {}
    headers['Accept-Language'] = 'en-GB,en-US,en'
    headers['Accept-Encoding'] = 'gzip,deflate,sdch'
    headers['Accept-Charset'] = 'max-age=0'
    headers['Cache-Control'] = 'ISO-8859-1,utf-8,*'
    headers['Cache-Control'] = 'max-age=0'
    headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 5.1)'
    headers['Connection'] = 'keep-alive'
    headers['Accept'] = 'text/html,application/xhtml+xml,application/xml,*/*'
    headers['Range'] = ''

    conn = httplib.HTTPConnection('data09-cdn.datalock.ru:80')
    conn.request("GET", url_opt, '', headers)

    print "Request sent"

    resp = conn.getresponse()
    print resp.status
    print resp.reason
    print resp.getheaders()

    file_for_wirte = open('cartoon.mp4', 'w')
    file_for_wirte.write(resp.read())

    print resp.read()

    conn.close()


if __name__ == "__main__":
    main()

Here is output:

Request sent
200
OK
[('content-length', '62515220'), ('accept-ranges', 'bytes'), ('server', 'nginx/1.2.7'), ('last-modified', 'Thu, 20 Jun 2013 12:10:43 GMT'), ('connection', 'keep-alive'), ('date', 'Fri, 14 Feb 2014 07:53:30 GMT'), ('content-type', 'video/mp4')]

This code working perfectly however I do not understand through the documentation how to download file using ranges. If you see output of response, which server provides:

 ('content-length', '62515220'), ('accept-ranges', 'bytes')

It supports range in 'bytes' unit where content size is 62515220

However in this request whole file downloaded. But what I want to do first obtain server information like does this file can be supported using http range queries and content size of file with out downloading? And how I can create http query with range (i.e.: 0~25000)?

Khamidulla
  • 2,927
  • 6
  • 35
  • 59
  • 1
    This might help: http://stackoverflow.com/q/8293687/2319400 – sebastian Feb 14 '14 at 09:50
  • See here: http://stackoverflow.com/questions/1798879/download-file-using-partial-download-http Different library, but should get you on the right track. – pi. Feb 14 '14 at 10:01
  • @sebastian Thank you for your comment. I already saw this answer. Moreover I capture packets using wireshark. However it is not clear how to detect does server support range selection. I mean is there method to check does file can be download using range selection or not? There is applications which support multi thread downloading with range selection however if server does not support it still tries to download files in other threads (i.e., flashgot, reget and etc.). However how i can obtain server or file information where I can lookup range support information? – Khamidulla Feb 14 '14 at 14:05
  • @pi. Thank you for your comment. If functionality which is not provided by `httplib` will not be enough I will defiantly use the library which supports it. – Khamidulla Feb 14 '14 at 14:07

1 Answers1

14

Pass Range header with bytes=start_offset-end_offset as range specifier.

For example, following code retrieve the first 300 bytes. (0-299):

>>> import httplib
>>> conn = httplib.HTTPConnection('localhost')
>>> conn.request("GET", '/', headers={'Range': 'bytes=0-299'}) # <----
>>> resp = conn.getresponse()
>>> resp.status
206
>>> resp.status == httplib.PARTIAL_CONTENT
True
>>> resp.getheader('content-range')
'bytes 0-299/612'
>>> content = resp.read()
>>> len(content)
300

NOTE Both start_offset, end_offset are inclusive.

UPDATE

If the server does not understand Range header, it will respond with the status code 200 (httplib.OK) instead of 206 (httplib.PARTIAL_CONTENT), and it will send whole content. To make sure the server respond partial content, check the status code.

>>> resp.status == httplib.PARTIAL_CONTENT
True
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • Thank you for your answer. I really appreciate you quick response. I will accept your answer. And one more thing is it possible to detect does server support or not range selection for downloading or not? – Khamidulla Feb 14 '14 at 14:00
  • @Phoenix, If the server does not support `Range` header, it will respond with 200 (`httplib.OK`) status code instead of 206 (`httplib.PARTIAL_CONTENT`). So check the status code as shown in the example code: `resp.status == httplib.PARTIAL_CONTENT` – falsetru Feb 14 '14 at 14:04
  • Thank you for clarification I will upvote your answer tomorrow because you reach your daily limit today. :) – Khamidulla Feb 14 '14 at 14:09
  • And please if you can just extend your answer with little bit explanation it will be helpful others with out reading comments. Thank you. – Khamidulla Feb 14 '14 at 14:10
  • 1
    @Phoenix, I thought `>>> resp.status == httplib.PARTIAL_CONTENT` in the answer was enough. I updated the answer with explanation as you suggested. Thank you for the comment. – falsetru Feb 14 '14 at 14:16
  • @Expolarity, This answer's purpose is to download partial content, not inspect whether it's possible. Using HEAD will not download content. – falsetru Oct 16 '20 at 13:57
  • You could also check that the `Accept-Ranges` header exists and is `bytes` (as opposed to `none`). – shreyasminocha Feb 08 '21 at 00:38