10

I have a list of links that I am trying to get the size of to determine how much computational resources each file need. Is it possible to just get the file size with a get request or something similar?

Here is an example of one of the links: https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887

Thanks

Joe B
  • 912
  • 2
  • 15
  • 36
  • 1
    You can take a look [here](https://stackoverflow.com/questions/5909/get-size-of-a-file-before-downloading-in-python). – Vasilis G. Mar 18 '19 at 16:59

3 Answers3

9

To do this use the HTTP HEAD method which just grabs the header information for the URL and doesn't download the content like an HTTP GET request does.

$curl -I https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 18 Mar 2019 16:56:35 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: keep-alive
Accept-Ranges: bytes

The file size is in the 'Content-Length' header. In Python 3.6:

>>> import urllib
>>> req = urllib.request.Request('https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887', 
                                 method='HEAD')
>>> f = urllib.request.urlopen(req)
>>> f.status
200
>>> f.headers['Content-Length']
'578220087'
Steven Graham
  • 1,220
  • 8
  • 10
  • note if the remote server does not implement head you can still achieve something similar by using the stream = True option with the python requests library as on https://stackoverflow.com/a/44299915 and then closing each request directly after you have obtained their headers. – Maarten Derickx Dec 21 '20 at 16:42
8

You need to use the HEAD method. The example uses requests (pip install requests).

#!/usr/bin/env python
# display URL file size without downloading

import sys
import requests

# pass URL as first argument
response = requests.head(sys.argv[1], allow_redirects=True)

size = response.headers.get('content-length', -1)

# size in megabytes (Python 2, 3)
print('{:<40}: {:.2f} MB'.format('FILE SIZE', int(size) / float(1 << 20)))

# size in megabytes (f-string, Python 3 only)
# print(f"{'FILE SIZE':<40}: {int(size) / float(1 << 20):.2f} MB")

Also see How do you send a HEAD HTTP request in Python 2? if you need a standard-library based solution.

ccpizza
  • 28,968
  • 18
  • 162
  • 169
1

If you're using Python 3, you can do it using urlopen from urllib.request:

from urllib.request import urlopen
link =  "https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887"
site = urlopen(link)
meta = site.info()
print(meta)

This will output:

Server: nginx
Date: Mon, 18 Mar 2019 17:02:40 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: close
Accept-Ranges: bytes

The Content-Length property is the size of your file in bytes.

Vasilis G.
  • 7,556
  • 4
  • 19
  • 29