How can the link from this string be removed
s=' hello how are you www.ford.com today '
so that the output is
s='hello how are you today'
How can the link from this string be removed
s=' hello how are you www.ford.com today '
so that the output is
s='hello how are you today'
Try the following list comprehension, which omits words of the pattern www._____.com
:
' '.join(item for item in s.split() if not (item.startswith('www.') and item.endswith('.com')) and len(item) > 7) #the len(item) is to make sure that words like www.com, which aren't real URLs, aren't removed
>>> s=' hello how are you www.ford.com today '
>>> ' '.join(item for item in s.split() if not (item.startswith('www.') and item.endswith('.com') and len(item) > 7))
'hello how are you today'
>>>
This seems like a good situation for a regex substitution.
>>> import re
>>> s = ' hello how are you www.ford.com today www.example.co.jp '
>>> re.sub(r'\s*(?:https?://)?www\.\S*\.[A-Za-z]{2,5}\s*', ' ', s).strip()
'hello how are you today'
The above finds any string that starts with potential whitespace, then possibly https://
or http://
, then www.
, then any non-whitespace characters, then .
followed by 2-5 alphabetical characters, then potential whitespace. It replaces such strings with a single space, and then removes leading and trailing whitespace from the result.
Note that this is a naive example of a URL, as defined by your specific example. See this answer for a regex with a more complete definition of what constitutes a URL.
While you can certainly use strings
methods, I prefer the regular expression based approach. It can handle spaces between words.
import re
s = " hello www.something.com there bobby"
s = re.sub(r'www\.\S+\.com', '',s)
print(s) # hello there bobby
s = "hello www. begins and .com ends"
s = re.sub(r'www\.\S+\.com', '',s)
print(s) # hello www. begins and .com ends
In order to deal with the case where there is no space around the url, you can use the string split method like this:
if ".com" in s:
s=''.join((s.split("www.")[0], " ", s.split(".com")[1]))