Getting the filename and extension form a URL in python

Question

So I am making this downloader app in python using tkinter and urllib.request and I want to give the user the option to have the file downloaded with default name and extension. And I know that there are MILLIONS of tutorials out there on how to do this, but my problem is with this specific URL: https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcR9JHQ-y1AyCjkJt3gl0jTtNtQdhv0lCdDYxqnc2wY9zy_hSOSy I have tried many codes like wget and urlparse but none of them were able to get the extension of this file from its URL. So is there any other way? The wget command:

url = 'https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcR9JHQ-y1AyCjkJt3gl0jTtNtQdhv0lCdDYxqnc2wY9zy_hSOSy'
test = wget.detect_filename(url)
print(test)

The output with the mentioned URL:

images

The urllib.parse command:

url = 'https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcR9JHQ-y1AyCjkJt3gl0jTtNtQdhv0lCdDYxqnc2wY9zy_hSOSy'
path = urllib.parse.urlparse(url).path
ext = os.path.splitext(path)[1]
print(path)
print(ext)

The output with the mentioned URL:

/images

Is there something wrong with the URL?

Check this out. https://stackoverflow.com/questions/31804799/how-to-get-pdf-filename-with-python-requests — Farhood ET, Apr 19 '20 at 09:22
That's because your URL is not sending an image file with a file name and extension. It's just being dynamically rendered in your browser. — Muhammad Ali, Apr 19 '20 at 09:24
As others have noted, you can't get the extension because the URL has no extension -- if you wanted you could do something sneaky like deducing the mime type from the file header and in turn suggesting an appropriate extension from that, using [python-magic](https://github.com/ahupp/python-magic) or the like. This would require reading the first few bits of the file though... — lemonhead, Apr 19 '20 at 09:32

score 3 · Answer 1 · edited Apr 19 '20 at 09:47

You should be able to get MIME type from the response headers, then use mimetypes to get an extension to suggest

import requests, mimetypes

r = requests.get('https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcR9JHQ-y1AyCjkJt3gl0jTtNtQdhv0lCdDYxqnc2wY9zy_hSOSy')
r.headers
{'Accept-Ranges': 'bytes', 'Content-Type': 'image/jpeg', 'Access-Control-Allow-Origin': '*', 'Content-Length': '4517', 'Date': 'Sun, 19 Apr 2020 09:23:26 GMT', 'Expires': 'Mon, 19 Apr 2021 09:23:26 GMT', 'Last-Modified': 'Fri, 15 Jan 2016 11:47:48 GMT', 'X-Content-Type-Options': 'nosniff', 'Server': 'sffe', 'X-XSS-Protection': '0', 'Cache-Control': 'public, max-age=31536000', 'Age': '840', 'Alt-Svc': 'quic=":443"; ma=2592000; v="46,43",h3-Q050=":443"; ma=2592000,h3-Q049=":443"; ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,h3-T050=":443"; ma=2592000'}

r.headers['Content-Type']
'image/jpeg'

mimetypes.guess_all_extensions(r.headers['Content-Type'], strict=False)
['.jpe', '.jpeg', '.jpg']

score 0 · Answer 2 · answered Apr 19 '20 at 09:34

Try this. You can also change the extension and the name of output file by editing the final_file_name variable. For the answer, I have left it as "image.jpg".

import requests

final_url = "https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcR9JHQ-y1AyCjkJt3gl0jTtNtQdhv0lCdDYxqnc2wY9zy_hSOSy"
final_file = requests.get(final_url)
final_file_name = "image.jpg"
open(final_file_name,"wb").write(final_file.content)

Getting the filename and extension form a URL in python

2 Answers2