165

How can I extract whatever follows the last slash in a URL in Python? For example, these URLs should return the following:

URL: http://www.test.com/TEST1
returns: TEST1

URL: http://www.test.com/page/TEST2
returns: TEST2

URL: http://www.test.com/page/page/12345
returns: 12345

I've tried urlparse, but that gives me the full path filename, such as page/page/12345.

Remi Guan
  • 21,506
  • 17
  • 64
  • 87
mix
  • 6,943
  • 15
  • 61
  • 90

14 Answers14

340

You don't need fancy things, just see the string methods in the standard library and you can easily split your url between 'filename' part and the rest:

url.rsplit('/', 1)

So you can get the part you're interested in simply with:

url.rsplit('/', 1)[-1]
Remi Guan
  • 21,506
  • 17
  • 64
  • 87
Luke404
  • 10,282
  • 3
  • 25
  • 31
  • 16
    `url.rsplit('/', 1)` returns a list, and `url.rsplit('/', 1)[-1]` is the bit after the last slash. – Hugo Oct 13 '15 at 12:26
  • 5
    Another way to do would be: url.rsplit('/', 1).pop() – Alex Fortin Mar 02 '18 at 17:55
  • 20
    **WARNING:** This basic trick breaks completely on URLs such as `http://www.example.com/foo/?entry=the/bar#another/bar`. But basic parsing like `rsplit` is okay if you are absolutely certain there will never be any slashes in your query or fragment parameters. However, I shudder to think of how many codebases actually contain this `rsplit` code and its associated bug with query handling. **People who want ABSOLUTE SECURITY AND RELIABILITY should be using `urllib.parse()` instead! You can then use the `path` value that it returns and split THAT to ensure that you've split ONLY the path.** – Mitch McMabers May 31 '20 at 07:26
  • 14
    **CODE: An example of how to implement the better method:** `from urllib.parse import urlparse; p = urlparse("http://www.example.com/foo.htm?entry=the/bar#another/bar"); print(p.path.rsplit("/", 1)[-1])` Result: `foo.htm` – Mitch McMabers May 31 '20 at 07:37
  • @MitchMcMabers please turn this into an answer (which should then be the accepted one) – Caterpillaraoz Feb 19 '21 at 18:05
  • 1
    @Caterpillaraoz I count two non-accepted answers here that suggest exactly this for years now :) – tzot Sep 20 '21 at 08:51
94

One more (idio(ma)tic) way:

URL.split("/")[-1]
Kimvais
  • 38,306
  • 16
  • 108
  • 142
16

rsplit should be up to the task:

In [1]: 'http://www.test.com/page/TEST2'.rsplit('/', 1)[1]
Out[1]: 'TEST2'
Benjamin Wohlwend
  • 30,958
  • 11
  • 90
  • 100
13

urlparse is fine to use if you want to (say, to get rid of any query string parameters).

import urllib.parse

urls = [
    'http://www.test.com/TEST1',
    'http://www.test.com/page/TEST2',
    'http://www.test.com/page/page/12345',
    'http://www.test.com/page/page/12345?abc=123'
]

for i in urls:
    url_parts = urllib.parse.urlparse(i)
    path_parts = url_parts[2].rpartition('/')
    print('URL: {}\nreturns: {}\n'.format(i, path_parts[2]))

Output:

URL: http://www.test.com/TEST1
returns: TEST1

URL: http://www.test.com/page/TEST2
returns: TEST2

URL: http://www.test.com/page/page/12345
returns: 12345

URL: http://www.test.com/page/page/12345?abc=123
returns: 12345
Jacob Wan
  • 2,521
  • 25
  • 19
13

You can do like this:

head, tail = os.path.split(url)

Where tail will be your file name.

Harsha Biyani
  • 7,049
  • 9
  • 37
  • 61
neowinston
  • 7,584
  • 10
  • 52
  • 83
  • 1
    This won't work on systems where the path separator is not "/". One of the notes in the os.path [docs](https://docs.python.org/3/library/os.path.html) mentions a posixpath, but I couldn't import it on my system: "you can also import and use the individual modules if you want to manipulate a path that is always in one of the different formats. They all have the same interface: posixpath for UNIX-style paths" – aschmied Sep 24 '21 at 19:48
10
os.path.basename(os.path.normpath('/folderA/folderB/folderC/folderD/'))
>>> folderD
Stéphane Bruckert
  • 21,706
  • 14
  • 92
  • 130
Rochan
  • 1,412
  • 1
  • 14
  • 17
  • 1
    this also works: ```from pathlib import Path print(f"Path(redirected_response.url).stem: {Path(redirected_response.url).stem!r}")``` – Alex Glukhovtsev Jun 25 '20 at 08:35
  • [URLs](https://tools.ietf.org/html/rfc3986#section-3) aren't file paths, they can contain a `?query=string` or a `#fragment` after the path. – Boris Verkhovskiy Nov 18 '20 at 22:07
5

Here's a more general, regex way of doing this:

    re.sub(r'^.+/([^/]+)$', r'\1', url)
sandoronodi
  • 315
  • 2
  • 12
4

Use urlparse to get just the path and then split the path you get from it on / characters:

from urllib.parse import urlparse

my_url = "http://example.com/some/path/last?somequery=param"
last_path_fragment = urlparse(my_url).path.split('/')[-1]  # returns 'last'

Note: if your url ends with a / character, the above will return '' (i.e. the empty string). If you want to handle that case differently, you need to strip the last trailing / character before you split the path:

my_url = "http://example.com/last/"
# handle URL ending in `/` by removing it.
last_path_fragment = urlparse(my_url).path.rstrip('/', 1).split('/')[-1]  # returns 'last'
Boris Verkhovskiy
  • 14,854
  • 11
  • 100
  • 103
4

First extract the path element from the URL:

from urllib.parse import urlparse
parsed= urlparse('https://www.dummy.example/this/is/PATH?q=/a/b&r=5#asx')

and then you can extract the last segment with string functions:

parsed.path.rpartition('/')[2]

(example resulting to 'PATH')

tzot
  • 92,761
  • 29
  • 141
  • 204
3

The following solution, which uses pathlib to parse the path obtained from urllib.parse allows to get the last part even when a terminal slash is present:

import urllib.parse
from pathlib import Path

urls = [
    "http://www.test.invalid/demo",
    "http://www.test.invalid/parent/child",
    "http://www.test.invalid/terminal-slash/",
    "http://www.test.invalid/query-params?abc=123&works=yes",
    "http://www.test.invalid/fragment#70446893",
    "http://www.test.invalid/has/all/?abc=123&works=yes#70446893",
]

for url in urls:
    url_path = Path(urllib.parse.urlparse(url).path)
    last_part = url_path.name  # use .stem to cut file extensions
    print(f"{last_part=}")

yields:

last_part='demo'
last_part='child'
last_part='terminal-slash'
last_part='query-params'
last_part='fragment'
last_part='all'
lcnittl
  • 233
  • 1
  • 14
0

Split the url and pop the last element url.split('/').pop()

Atul Yadav
  • 1,992
  • 1
  • 13
  • 15
0

Split the URL and pop the last element

const plants = ['broccoli', 'cauliflower', 'cabbage', 'kale', 'tomato'];

console.log(plants.pop());
// expected output: "tomato"

console.log(plants);
// expected output: Array ["broccoli", "cauliflower", "cabbage", "kale"]
Jaimin Patel
  • 4,559
  • 3
  • 32
  • 35
0
extracted_url = url[url.rfind("/")+1:];
fardjad
  • 20,031
  • 6
  • 53
  • 68
-5
url ='http://www.test.com/page/TEST2'.split('/')[4]
print url

Output: TEST2.

sigod
  • 3,514
  • 2
  • 21
  • 44
live_alone
  • 159
  • 1
  • 11