Python split url to find image name and extension

Question

I am looking for a way to extract a filename and extension from a particular url using Python

lets say a URL looks as follows

picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"

How would I go about getting the following.

filename = "da4ca3509a7b11e19e4a12313813ffc0_7"
file_ext = ".jpg"

score 33 · Answer 1 · edited Dec 29 '20 at 18:41

33

try:
    # Python 3
    from urllib.parse import urlparse
except ImportError:
    # Python 2
    from urlparse import urlparse
from os.path import splitext, basename

picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
disassembled = urlparse(picture_page)
filename, file_ext = splitext(basename(disassembled.path))

Only downside with this is that your filename will contain a preceding / which you can always remove yourself.

edited Dec 29 '20 at 18:41

Charles L.

5,795
10
40
60

answered May 11 '12 at 13:29

Christian Witts

11,375
1
33
46

2

the preceding '/' is not the only problem, if the url contains other subdirectories, they will be kept in the filename, maybe OP wants them, maybe not ;) – Cédric Julien May 11 '12 at 13:38
@Cédric Julien - Thanks for the reminder about .basename to get just the last portion, edited the post to reflect so. :) – Christian Witts May 11 '12 at 13:47
6

This code can work with files without extension and urls like `http://server.com/common/image.jpg?xx=345&yy=qwerty` BTW in 3.x one need to use `from urllib.parse import urlparse` – El Ruso Nov 11 '15 at 19:12

Cédric Julien · Answer 2 · 2012-05-11T13:35:29.680

12

Try with urlparse.urlsplit to split url, and then os.path.splitext to retrieve filename and extension (use os.path.basename to keep only the last filename) :

import urlparse
import os.path

picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"

print os.path.splitext(os.path.basename(urlparse.urlsplit(picture_page).path))

>>> ('da4ca3509a7b11e19e4a12313813ffc0_7', '.jpg')

edited May 11 '12 at 13:35

answered May 11 '12 at 13:28

Cédric Julien

78,516
15
127
132

urlparse is now moved to urllib in python 3. Your solution still works. Thanks. :) – kinshuk4 May 18 '16 at 12:06

score 10 · Answer 3 · answered May 11 '12 at 13:27

10

filename = picture_page.split('/')[-1].split('.')[0]
file_ext = '.'+picture_page.split('.')[-1]

answered May 11 '12 at 13:27

Niek de Klein

8,524
20
72
143

THank you! It can be usefull if no reasons to import extra libraries – Roman Podlinov May 22 '13 at 13:58

bad_keypoints · Answer 4 · 2014-09-22T07:55:08.837

6

# Here's your link:
picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"

#Here's your filename and ext:
filename, ext = (picture_page.split('/')[-1].split('.'))

When you do picture_page.split('/'), it will return a list of strings from your url split by a /. If you know python list indexing well, you'd know that -1 will give you the last element or the first element from the end of the list. In your case, it will be the filename: da4ca3509a7b11e19e4a12313813ffc0_7.jpg

Splitting that by delimeter ., you get two values: da4ca3509a7b11e19e4a12313813ffc0_7 and jpg, as expected, because they are separated by a period which you used as a delimeter in your split() call.

Now, since the last split returns two values in the resulting list, you can tuplify it. Hence, basically, the result would be like:

filename,ext = ('da4ca3509a7b11e19e4a12313813ffc0_7', 'jpg')

edited Sep 22 '14 at 07:55

answered Sep 18 '14 at 13:16

bad_keypoints

1,382
2
23
45

1

While your code might (or not) work it would be great if you add a brief explanation about the problem and how does your code solve it. As is it does not provide a full answer according to [help center](http://stackoverflow.com/help/how-to-answer) – dic19 Sep 18 '14 at 15:19
It will always work, provided he gets his file urls in a way that the file always has an extension. He could add a simple if statement in the mix to handle files with no extensions ( `if len(url.split('/')[-1].split('.'))==1: #No extension; else: #Get filename,ext` – bad_keypoints Sep 22 '14 at 07:57
Please note the point of my comment is not if your code actually works or it doesn't. It's about answer's quality. Note that your answer is better now since you have added a brief explanation as suggested. +1 for your edit :) – dic19 Sep 22 '14 at 11:29
Thank you anyways, it made me make my answer better. – bad_keypoints Sep 22 '14 at 13:12

Levon · Answer 5 · 2012-05-11T13:35:09.980

3

os.path.splitext will help you extract the filename and extension once you have extracted the relevant string from the URL using urlparse:

   fName, ext = os.path.splitext('yourImage.jpg')

edited May 11 '12 at 13:35

answered May 11 '12 at 13:28

Levon

138,105
33
200
191

score 0 · Answer 6 · answered Aug 21 '19 at 09:06

This is the easiest way to find image name and extension using regular expression.

import re
import sys

picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"

regex = re.compile('(.*\/(?P<name>\w+)\.(?P<ext>\w+))')

print  regex.search(picture_page).group('name')
print  regex.search(picture_page).group('ext')

score -2 · Answer 7 · answered May 11 '12 at 13:31

>>> import re
>>> s = 'picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"'
>>> re.findall(r'\/([a-zA-Z0-9_]*)\.[a-zA-Z]*\"$',s)[0]
'da4ca3509a7b11e19e4a12313813ffc0_7'
>>> re.findall(r'([a-zA-Z]*)\"$',s)[0]
'jpg'

Python split url to find image name and extension

7 Answers7

Linked