python, remove link from string

Question

How can the link from this string be removed

s=' hello how are you www.ford.com today '

so that the output is

s='hello how are you today'

Are you asking for a general solution or just for that string, because urls can be extremely diverse — Natecat, Mar 31 '16 at 02:24
a solution that can handle any sub-string in the form of www.something.com — Mustard Tiger, Mar 31 '16 at 02:26

A.J. Uppal · Accepted Answer · 2016-03-31T02:45:46.783

7

Try the following list comprehension, which omits words of the pattern www._____.com:

' '.join(item for item in s.split() if not (item.startswith('www.') and item.endswith('.com')) and len(item) > 7) #the len(item) is to make sure that words like www.com, which aren't real URLs, aren't removed

>>> s=' hello how are you www.ford.com today '
>>> ' '.join(item for item in s.split() if not (item.startswith('www.') and item.endswith('.com') and len(item) > 7))
'hello how are you today'
>>>

edited Mar 31 '16 at 02:45

answered Mar 31 '16 at 02:27

A.J. Uppal

19,117
6
45
76

That is a very elegant and readable solution – Natecat Mar 31 '16 at 02:29
@Natecat not sure if you're being sarcastic :) – A.J. Uppal Mar 31 '16 at 02:30
what if the string is s=' hello how are youwww.ford.comtoday ', and the words within the sting have no space between the link? @A.J. – Mustard Tiger Mar 31 '16 at 02:31
@abcla My answer addresses this case – Natecat Mar 31 '16 at 02:40
1

Wouldn't `www.com`, which is not a URL, get removed by this? – TigerhawkT3 Mar 31 '16 at 02:42
@TigerhawkT3 fixed :) – A.J. Uppal Mar 31 '16 at 02:46

score 3 · Answer 2 · edited May 23 '17 at 11:59

This seems like a good situation for a regex substitution.

>>> import re
>>> s = ' hello how are you www.ford.com today www.example.co.jp '
>>> re.sub(r'\s*(?:https?://)?www\.\S*\.[A-Za-z]{2,5}\s*', ' ', s).strip()
'hello how are you today'

The above finds any string that starts with potential whitespace, then possibly https:// or http://, then www., then any non-whitespace characters, then . followed by 2-5 alphabetical characters, then potential whitespace. It replaces such strings with a single space, and then removes leading and trailing whitespace from the result.

Note that this is a naive example of a URL, as defined by your specific example. See this answer for a regex with a more complete definition of what constitutes a URL.

Ben · Answer 3 · 2016-03-31T02:48:48.233

2

While you can certainly use strings methods, I prefer the regular expression based approach. It can handle spaces between words.

import re

s = " hello www.something.com there bobby"
s = re.sub(r'www\.\S+\.com', '',s)
print(s) # hello  there bobby
s = "hello www. begins and .com ends"
s = re.sub(r'www\.\S+\.com', '',s)
print(s) # hello www. begins and .com ends

edited Mar 31 '16 at 02:48

answered Mar 31 '16 at 02:40

Ben

5,952
4
33
44

This fails on the same condition as mine as Gerrat mentioned – Natecat Mar 31 '16 at 02:42
@Natecat should work with spaces between the phrases now. – Ben Mar 31 '16 at 02:49
if ```s = "hello www. begins and .com ends"```,and I also want to print ```hello there bobby```,so how can I remove single whitespace within the url link. – user3849475 Sep 19 '17 at 02:34

score 0 · Answer 4 · answered Mar 31 '16 at 02:36

0

In order to deal with the case where there is no space around the url, you can use the string split method like this:

if ".com" in s:
    s=''.join((s.split("www.")[0], " ", s.split(".com")[1]))

answered Mar 31 '16 at 02:36

Natecat

2,175
1
17
20

Your expression fails on sentences like: `The prefix www. often starts a url, while .com ends it` – Gerrat Mar 31 '16 at 02:40

python, remove link from string

4 Answers4