How do I remove a substring from the end of a string (remove a suffix of the string)?

Question

I have the following code:

url = 'abcdc.com'
print(url.strip('.com'))

I expected: abcdc

I got: abcd

Now I do

url.rsplit('.com', 1)

Is there a better way?

_{See How do the .strip/.rstrip/.lstrip string methods work in Python? for a specific explanation of what the first attempt is doing.}

Yeah. str.strip doesn't do what you think it does. str.strip removes any of the characters specified from the beginning and the end of the string. So, "acbacda".strip("ad") gives 'cbac'; the a at the beginning and the da at the end were stripped. Cheers. — scvalex, Jun 24 '09 at 15:03
__strip__ strips the characters given from both ends of the string, in your case it strips ".", "c", "o" and "m". — mthurlin, Jun 24 '09 at 14:48
It will also remove those characters from the front of the string. If you just want it to remove from the end, use rstrip() — Andre Miller, Jun 24 '09 at 14:53
Plus, this removes the characters in *any order*: "site.ocm" > "site". — Eric O. Lebigot, May 05 '13 at 02:10
@scvalex, wow just realised this having used it that way for ages - it's dangerous because the code often happens to work anyway — Flash, Apr 02 '17 at 16:06
@AndreMiller In this specific case even `rstrip()` won't work because in the string `'abcdc.com'` it will just leave `abcd` as it has a `c` just before the dot. — SKR, Oct 22 '18 at 15:20
What's wrong with rsplit? Your solution already seems like the best one to me. — Ben Farmer, Dec 15 '22 at 02:49

score 846 · Accepted Answer · edited Nov 25 '20 at 06:43

846

strip doesn't mean "remove this substring". x.strip(y) treats y as a set of characters and strips any characters in that set from both ends of x.

On Python 3.9 and newer you can use the removeprefix and removesuffix methods to remove an entire substring from either side of the string:

url = 'abcdc.com'
url.removesuffix('.com')    # Returns 'abcdc'
url.removeprefix('abcdc.')  # Returns 'com'

The relevant Python Enhancement Proposal is PEP-616.

On Python 3.8 and older you can use endswith and slicing:

url = 'abcdc.com'
if url.endswith('.com'):
    url = url[:-4]

Or a regular expression:

import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)

edited Nov 25 '20 at 06:43

Boris Verkhovskiy

14,854
11
100
103

answered Jun 24 '09 at 14:47

Steef

33,059
4
45
36

Which would be better?? re.sub('\.com$', '', url) url.rsplit('.com', 1)[0] Or are both, different ways to solve the problem.. – Ramya Jun 24 '09 at 15:01
4

Yeah, I myself think that the first example, with the endswith() test, would be the better one; the regex one would involve some performance penalty (parsing the regex, etc.). I wouldn't go with the rsplit() one, but that's because I don't know what you're exactly trying to achieve. I figure it's removing the .com if and only if it appears at the end of the url? The rsplit solution would give you trouble if you'd use it on domain names like 'www.commercialthingie.co.uk' – Steef Jun 24 '09 at 15:26
13

`url = url[:-4] if any(url.endswith(x) for x in ('.com','.net')) else url` – Burhan Khalid May 07 '13 at 04:56
1

what if I write `EXAMLPLE.COM` domain names are not case sensitive. (This is a vote for the regex solution) – Jasen Mar 26 '15 at 02:44
Regarding the first solution: Why rewrite rsplit()? It's already in the language: `url=url.rsplit(".com",1)[0]` -1 – Mike S Feb 07 '17 at 21:25
3

It is not a rewrite, the `rsplit()` solution doesn't have the same behaviour as the `endswith()` one when the original string does not have the substring at the end, but somewhere in the middle. For instance: `"www.comeandsee.com".rsplit(".com",1)[0] == "www.comeandsee"` but `"www.comeandsee.net".rsplit(".com",1)[0] == "www"` – Steef Feb 09 '17 at 13:44
5

The syntax `s[:-n]` has a caveat: for `n = 0`, this doesn't return the string with the last zero characters chopped off, but the empty string instead. – BlenderBender Jun 09 '18 at 14:51
As of python 3.6, regexes should be marked as such - `url = re.sub(r'\.com$', '', url)` (note the `r` in front of it the regex) - see https://www.flake8rules.com/rules/W605.html – Dale C. Anderson Oct 19 '22 at 18:07

score 111 · Answer 2 · edited Jan 24 '19 at 17:22

111

If you are sure that the string only appears at the end, then the simplest way would be to use 'replace':

url = 'abcdc.com'
print(url.replace('.com',''))

edited Jan 24 '19 at 17:22

Peterino

15,097
3
28
29

answered Mar 06 '10 at 15:41

Charles Collis

1,471
1
9
2

64

that will also replace url like `www.computerhope.com`. do a check with `endswith()` and should be fine. – ghostdog74 Mar 07 '10 at 00:26
88

`"www.computerhope.com".endswith(".com")` is true, it still will break! – Mar 23 '15 at 20:31
2

"If you are sure that the string only appears at the end" do you mean "If you are sure that the substring appears only once" ? replace seems to work also when the substring is in the middle, but as the other comment suggests it will replace any occurence of the substring, why it should be at the end I dont understand – 463035818_is_not_an_ai Jan 22 '19 at 13:04

score 71 · Answer 3 · edited Dec 22 '20 at 03:02

71

def strip_end(text, suffix):
    if suffix and text.endswith(suffix):
        return text[:-len(suffix)]
    return text

edited Dec 22 '20 at 03:02

Boris Verkhovskiy

14,854
11
100
103

answered Jun 24 '09 at 15:13

yairchu

23,680
7
69
109

@Boris I liked it before, without the extra check of whether suffix is empty – yairchu Dec 22 '20 at 12:40
1

@yarichu I copied the code from [PEP 616](https://www.python.org/dev/peps/pep-0616/) that introduced this exact function into the stdlib. The reason I also think this way is better is that the reason you have to do `len(text)-len(suffix)` is unclear when you can just use negative indices in Python (in fact, you fixed that bug in an edit and there used to be a comment here incorrectly telling you that you don't need the `len(text)`, so this seems error prone), whereas `if suffix` makes it clear exactly what you're _actually_ checking and why. – Boris Verkhovskiy Dec 22 '20 at 15:48

score 60 · Answer 4 · edited Nov 03 '16 at 09:49

60

Since it seems like nobody has pointed this on out yet:

url = "www.example.com"
new_url = url[:url.rfind(".")]

This should be more efficient than the methods using split() as no new list object is created, and this solution works for strings with several dots.

edited Nov 03 '16 at 09:49

Géry Ogam

6,336
4
38
67

answered Aug 04 '14 at 19:27

user3129181

617
5
2

Wow that is a nice trick. I couldn't get this to fail but I also had a hard time being able to think up ways this might fail. I like it but it is very "magical", hard to know what this does by just looking at it. I had to mentally process each part of line to "get it". – DevPlayer Apr 07 '15 at 13:32
23

This fails if the searched-for string is NOT present, and it wrongly removes the last character instead. – robbat2 Sep 19 '15 at 20:15

Xavier Guihot · Answer 5 · 2020-12-22T07:12:56.667

32

Starting in Python 3.9, you can use removesuffix instead:

'abcdc.com'.removesuffix('.com')
# 'abcdc'

edited Dec 22 '20 at 07:12

answered Apr 25 '20 at 20:57

Xavier Guihot

54,987
21
291
190

And the python code from the specification can be found in [PEP 616](https://www.python.org/dev/peps/pep-0616/#specification) – Paul Tobias Sep 30 '20 at 07:30

score 28 · Answer 6 · answered Jun 24 '09 at 14:59

Depends on what you know about your url and exactly what you're tryinh to do. If you know that it will always end in '.com' (or '.net' or '.org') then

 url=url[:-4]

is the quickest solution. If it's a more general URLs then you're probably better of looking into the urlparse library that comes with python.

If you on the other hand you simply want to remove everything after the final '.' in a string then

url.rsplit('.',1)[0]

will work. Or if you want just want everything up to the first '.' then try

url.split('.',1)[0]

score 17 · Answer 7 · edited Apr 10 '18 at 10:56

17

If you know it's an extension, then

url = 'abcdc.com'
...
url.rsplit('.', 1)[0]  # split at '.', starting from the right, maximum 1 split

This works equally well with abcdc.com or www.abcdc.com or abcdc.[anything] and is more extensible.

edited Apr 10 '18 at 10:56

Vadim Kotov

8,084
8
48
62

answered Jun 24 '09 at 14:57

JohnMetta

18,782
5
31
57

This seems the most obvious and cleanest way to me. Doesn't have to be an extension though, you can just split on the whole substring to be matched. – Ben Farmer Dec 15 '22 at 02:46

David Foster · Answer 8 · 2022-04-25T23:42:26.700

15

On Python 3.9+:

text.removesuffix(suffix)

On any Python version:

def remove_suffix(text, suffix):
    return text[:-len(suffix)] if text.endswith(suffix) and len(suffix) != 0 else text

or the one-liner:

remove_suffix = lambda text, suffix: text[:-len(suffix)] if text.endswith(suffix) and len(suffix) != 0 else text

edited Apr 25 '22 at 23:42

answered Oct 28 '12 at 20:17

David Foster

6,931
4
41
42

1

Or `text[:-len(suffix)] if suffix and text.endswith(suffix) else text` – Boris Verkhovskiy Dec 22 '20 at 03:10

score 7 · Answer 9 · answered Jun 24 '09 at 14:48

7

How about url[:-4]?

answered Jun 24 '09 at 14:48

Daren Thomas

67,947
40
154
200

4

Seems almost guaranteed to lead to a bug once you get hit with a `.ca` or `.co.uk` url. – Peter Jun 16 '20 at 00:27

score 6 · Answer 10 · answered May 07 '13 at 04:49

For urls (as it seems to be a part of the topic by the given example), one can do something like this:

import os
url = 'http://www.stackoverflow.com'
name,ext = os.path.splitext(url)
print (name, ext)

#Or:
ext = '.'+url.split('.')[-1]
name = url[:-len(ext)]
print (name, ext)

Both will output: ('http://www.stackoverflow', '.com')

This can also be combined with str.endswith(suffix) if you need to just split ".com", or anything specific.

winni2k · Answer 11 · 2020-11-26T10:46:35.360

6

DSCLAIMER This method has a critical flaw in that the partition is not anchored to the end of the url and may return spurious results. For example, the result for the URL "www.comcast.net" is "www" (incorrect) instead of the expected "www.comcast.net". This solution therefore is evil. Don't use it unless you know what you are doing!

url.rpartition('.com')[0]

This is fairly easy to type and also correctly returns the original string (no error) when the suffix '.com' is missing from url.

edited Nov 26 '20 at 10:46

answered Jul 13 '17 at 10:08

winni2k

1,460
16
19

1

+1 partition is preferred when only one split is needed since it always returns an answer, an IndexError won't occur. – Gringo Suave Oct 09 '18 at 16:58
1

This doesn't correctly handle the suffix not being there. For example, it will incorrectly return `www` for `www.comcast.net`. – Boris Verkhovskiy Nov 25 '20 at 06:49
1

That's a really excellent point @Boris! Thank you so much for pointing it out. – winni2k Nov 26 '20 at 10:43

score 5 · Answer 12 · answered Apr 10 '20 at 18:31

Assuming you want to remove the domain, no matter what it is (.com, .net, etc). I recommend finding the . and removing everything from that point on.

url = 'abcdc.com'
dot_index = url.rfind('.')
url = url[:dot_index]

Here I'm using rfind to solve the problem of urls like abcdc.com.net which should be reduced to the name abcdc.com.

If you're also concerned about www.s, you should explicitly check for them:

if url.startswith("www."):
   url = url.replace("www.","", 1)

The 1 in replace is for strange edgecases like www.net.www.com

If your url gets any wilder than that look at the regex answers people have responded with.

score 4 · Answer 13 · edited May 15 '20 at 05:35

4

If you mean to only strip the extension:

'.'.join('abcdc.com'.split('.')[:-1])
# 'abcdc'

It works with any extension, with potential other dots existing in filename as well. It simply splits the string as a list on dots and joins it without the last element.

edited May 15 '20 at 05:35

Xavier Guihot

54,987
21
291
190

answered Jul 13 '17 at 12:56

Dcs

41
1

user1424589 · Answer 14 · 2020-08-29T02:13:13.440

If you need to strip some end of a string if it exists otherwise do nothing. My best solutions. You probably will want to use one of first 2 implementations however I have included the 3rd for completeness.

For a constant suffix:

def remove_suffix(v, s):
    return v[:-len(s)] if v.endswith(s) else v
remove_suffix("abc.com", ".com") == 'abc'
remove_suffix("abc", ".com") == 'abc'

For a regex:

def remove_suffix_compile(suffix_pattern):
    r = re.compile(f"(.*?)({suffix_pattern})?$")
    return lambda v: r.match(v)[1]
remove_domain = remove_suffix_compile(r"\.[a-zA-Z0-9]{3,}")
remove_domain("abc.com") == "abc"
remove_domain("sub.abc.net") == "sub.abc"
remove_domain("abc.") == "abc."
remove_domain("abc") == "abc"

For a collection of constant suffixes the asymptotically fastest way for a large number of calls:

def remove_suffix_preprocess(*suffixes):
    suffixes = set(suffixes)
    try:
        suffixes.remove('')
    except KeyError:
        pass

    def helper(suffixes, pos):
        if len(suffixes) == 1:
            suf = suffixes[0]
            l = -len(suf)
            ls = slice(0, l)
            return lambda v: v[ls] if v.endswith(suf) else v
        si = iter(suffixes)
        ml = len(next(si))
        exact = False
        for suf in si:
            l = len(suf)
            if -l == pos:
                exact = True
            else:
                ml = min(len(suf), ml)
        ml = -ml
        suffix_dict = {}
        for suf in suffixes:
            sub = suf[ml:pos]
            if sub in suffix_dict:
                suffix_dict[sub].append(suf)
            else:
                suffix_dict[sub] = [suf]
        if exact:
            del suffix_dict['']
            for key in suffix_dict:
                suffix_dict[key] = helper([s[:pos] for s in suffix_dict[key]], None)
            return lambda v: suffix_dict.get(v[ml:pos], lambda v: v)(v[:pos])
        else:
            for key in suffix_dict:
                suffix_dict[key] = helper(suffix_dict[key], ml)
            return lambda v: suffix_dict.get(v[ml:pos], lambda v: v)(v)
    return helper(tuple(suffixes), None)
domain_remove = remove_suffix_preprocess(".com", ".net", ".edu", ".uk", '.tv', '.co.uk', '.org.uk')

the final one is probably significantly faster in pypy then cpython. The regex variant is likely faster than this for virtually all cases that do not involve huge dictionaries of potential suffixes that cannot be easily represented as a regex at least in cPython.

In PyPy the regex variant is almost certainly slower for large number of calls or long strings even if the re module uses a DFA compiling regex engine as the vast majority of the overhead of the lambda's will be optimized out by the JIT.

In cPython however the fact that your running c code for the regex compare almost certainly outweighs the algorithmic advantages of the suffix collection version in almost all cases.

Edit: https://m.xkcd.com/859/

score 3 · Answer 15 · answered Oct 06 '20 at 14:38

Because this is a very popular question i add another, now available, solution. With python 3.9 (https://docs.python.org/3.9/whatsnew/3.9.html) the function removesuffix() will be added (and removeprefix()) and this function is exactly what was questioned here.

url = 'abcdc.com'
print(url.removesuffix('.com'))

output:

'abcdc'

PEP 616 (https://www.python.org/dev/peps/pep-0616/) shows how it will behave (it is not the real implementation):

def removeprefix(self: str, prefix: str, /) -> str:
    if self.startswith(prefix):
        return self[len(prefix):]
    else:
        return self[:]

and what benefits it has against self-implemented solutions:

Less fragile: The code will not depend on the user to count the length of a literal.
More performant: The code does not require a call to the Python built-in len function nor to the more expensive str.replace() method.
More descriptive: The methods give a higher-level API for code readability as opposed to the traditional method of string slicing.

Someone already posted about this 8 months before you did https://stackoverflow.com/a/61432508 — Boris Verkhovskiy, Nov 25 '20 at 06:52

score 2 · Answer 16 · edited Jun 10 '20 at 05:52

import re

def rm_suffix(url = 'abcdc.com', suffix='\.com'):
    return(re.sub(suffix+'$', '', url))

I want to repeat this answer as the most expressive way to do it. Of course, the following would take less CPU time:

def rm_dotcom(url = 'abcdc.com'):
    return(url[:-4] if url.endswith('.com') else url)

However, if CPU is the bottle neck why write in Python?

When is CPU a bottle neck anyway? In drivers, maybe.

The advantages of using regular expression is code reusability. What if you next want to remove '.me', which only has three characters?

Same code would do the trick:

>>> rm_sub('abcdc.me','.me')
'abcdc'

score 1 · Answer 17 · edited May 15 '20 at 05:32

1

You can use split:

'abccomputer.com'.split('.com',1)[0]
# 'abccomputer'

edited May 15 '20 at 05:32

Xavier Guihot

54,987
21
291
190

answered Dec 07 '12 at 08:57

Lucas

75
8

9

When `a = 'www.computerbugs.com'` this results with `'www`' – yairchu May 04 '13 at 23:02
Can do it from the reverse end I guess? Not sure if there is a way to write it more readably: `'www.computerbugs.com'[::-1].split('.com'[::-1], 1)[-1][::-1]` – Ben Farmer Dec 15 '22 at 02:43
Ah rsplit is the way: `'www.computerbugs.com'.rsplit('.com', 1)[0]` – Ben Farmer Dec 15 '22 at 02:45

juan Isaza · Answer 18 · 2016-09-28T19:33:02.403

In my case I needed to raise an exception so I did:

class UnableToStripEnd(Exception):
    """A Exception type to indicate that the suffix cannot be removed from the text."""

    @staticmethod
    def get_exception(text, suffix):
        return UnableToStripEnd("Could not find suffix ({0}) on text: {1}."
                                .format(suffix, text))


def strip_end(text, suffix):
    """Removes the end of a string. Otherwise fails."""
    if not text.endswith(suffix):
        raise UnableToStripEnd.get_exception(text, suffix)
    return text[:len(text)-len(suffix)]

mmj · Answer 19 · 2020-11-09T14:49:20.177

A broader solution, adding the possibility to replace the suffix (you can remove by replacing with the empty string) and to set the maximum number of replacements:

def replacesuffix(s,old,new='',limit=1):
    """
    String suffix replace; if the string ends with the suffix given by parameter `old`, such suffix is replaced with the string given by parameter `new`. The number of replacements is limited by parameter `limit`, unless `limit` is negative (meaning no limit).

    :param s: the input string
    :param old: the suffix to be replaced
    :param new: the replacement string. Default value the empty string (suffix is removed without replacement).
    :param limit: the maximum number of replacements allowed. Default value 1.
    :returns: the input string with a certain number (depending on parameter `limit`) of the rightmost occurrences of string given by parameter `old` replaced by string given by parameter `new`
    """
    if s[len(s)-len(old):] == old and limit != 0:
        return replacesuffix(s[:len(s)-len(old)],old,new,limit-1) + new
    else:
        return s

In your case, given the default arguments, the desired result is obtained with:

replacesuffix('abcdc.com','.com')
>>> 'abcdc'

Some more general examples:

replacesuffix('whatever-qweqweqwe','qwe','N',2)
>>> 'whatever-qweNN'

replacesuffix('whatever-qweqweqwe','qwe','N',-1)
>>> 'whatever-NNN'

replacesuffix('12.53000','0',' ',-1)
>>> '12.53   '

score 0 · Answer 20 · answered Jun 24 '09 at 14:53

0

This is a perfect use for regular expressions:

>>> import re
>>> re.match(r"(.*)\.com", "hello.com").group(1)
'hello'

answered Jun 24 '09 at 14:53

Aaron Maenpaa

119,832
11
95
108

7

You should also add a $ to make sure that you're matching hostnames *ending* in ".com". – Cristian Ciupitu Jun 24 '09 at 14:56

score 0 · Answer 21 · answered Feb 24 '23 at 14:57

Use the public suffix list hosted by Mozilla. It's available as the tldextract python library.

import tldextract

url = 'abcdc.com'

# Extract the domain and TLD
extracted = tldextract.extract(url)
domain, tld = extracted.domain, extracted.suffix

if tld and tld != 'localhost':
    url_without_tld = domain
else:
    url_without_tld = url

print(url_without_tld)

score 0 · Answer 22 · answered Aug 31 '23 at 19:18

0

Function To Remove a Suffix in Python 3.8 :

def removesuffix(text, suffix):
if text.endswith(suffix):
    return text[:-len(suffix)]
else:
    return text

answered Aug 31 '23 at 19:18

alemol

8,058
2
24
29

score -1 · Answer 23 · answered Mar 14 '20 at 07:50

-1

Here,i have a simplest code.

url=url.split(".")[0]

answered Mar 14 '20 at 07:50

Anshuman Jayaprakash

84
11

1

I think you mean `url = url.split(".")[:-1]`. – ingyhere Jul 28 '22 at 01:49

score -1 · Answer 24 · answered Jun 22 '21 at 08:27

-1

Using replace and count

This might seems a little bit a hack but it ensures you a safe replace without using startswith and if statement, using the count arg of replace you can limit the replace to one:

mystring = "www.comwww.com"

Prefix:

print(mystring.replace("www.","",1))

Suffix (you write the prefix reversed) .com becomes moc.:

print(mystring[::-1].replace("moc.","",1)[::-1])

answered Jun 22 '21 at 08:27

G M

20,759
10
81
84

Why would you do this ... – Smuuf May 23 '22 at 09:08

score -3 · Answer 25 · answered May 07 '20 at 15:39

-3

I used the built-in rstrip function to do it like follow:

string = "test.com"
suffix = ".com"
newstring = string.rstrip(suffix)
print(newstring)
test

answered May 07 '20 at 15:39

Zioalex

3,441
2
33
30

2

Bad idea. Try `"test.ccom"`. – Shital Shah May 12 '20 at 23:53
But this is not the point of the question. It was just asked to remove a known substring from the end of another. This works exactly as expected. – Zioalex May 13 '20 at 07:52
3

@Alex try your solution with mooc.com or maximo.com – lorenzo Jul 28 '20 at 15:30
Yes @lorenzo I tried now and it doesn't work properly because it eats everything. So although it works for the example it is not general enough. I tried with split and it works better but still not completely general: In [13]: string = "testcom.cp.com" ...: suffix = ".com" ...: newstring = string.split(suffix) ...: print(newstring[0]) testcom.cp – Zioalex Aug 03 '20 at 08:11
Your answer demonstrates the same misunderstanding about what `strip`/`rstrip`/`lstrip` do as the person asking the question. They treat the string passed to them as a *set of characters* not a literal suffix/prefix to be removed. Please read [the documentation](https://docs.python.org/3/library/stdtypes.html#str.rstrip). – Boris Verkhovskiy Nov 25 '20 at 06:57
The question **already tries** this approach and **demonstrates the exact problem**. – Karl Knechtel Jan 29 '23 at 18:21

How do I remove a substring from the end of a string (remove a suffix of the string)?

25 Answers25

Using replace and count

Linked

Related