1

How can the link from this string be removed

s=' hello how are you www.ford.com today '

so that the output is

s='hello how are you today'
falsetru
  • 357,413
  • 63
  • 732
  • 636
Mustard Tiger
  • 3,520
  • 8
  • 43
  • 68

4 Answers4

7

Try the following list comprehension, which omits words of the pattern www._____.com:

' '.join(item for item in s.split() if not (item.startswith('www.') and item.endswith('.com')) and len(item) > 7) #the len(item) is to make sure that words like www.com, which aren't real URLs, aren't removed

>>> s=' hello how are you www.ford.com today '
>>> ' '.join(item for item in s.split() if not (item.startswith('www.') and item.endswith('.com') and len(item) > 7))
'hello how are you today'
>>> 
A.J. Uppal
  • 19,117
  • 6
  • 45
  • 76
3

This seems like a good situation for a regex substitution.

>>> import re
>>> s = ' hello how are you www.ford.com today www.example.co.jp '
>>> re.sub(r'\s*(?:https?://)?www\.\S*\.[A-Za-z]{2,5}\s*', ' ', s).strip()
'hello how are you today'

The above finds any string that starts with potential whitespace, then possibly https:// or http://, then www., then any non-whitespace characters, then . followed by 2-5 alphabetical characters, then potential whitespace. It replaces such strings with a single space, and then removes leading and trailing whitespace from the result.

Note that this is a naive example of a URL, as defined by your specific example. See this answer for a regex with a more complete definition of what constitutes a URL.

Community
  • 1
  • 1
TigerhawkT3
  • 48,464
  • 6
  • 60
  • 97
2

While you can certainly use strings methods, I prefer the regular expression based approach. It can handle spaces between words.

import re

s = " hello www.something.com there bobby"
s = re.sub(r'www\.\S+\.com', '',s)
print(s) # hello  there bobby
s = "hello www. begins and .com ends"
s = re.sub(r'www\.\S+\.com', '',s)
print(s) # hello www. begins and .com ends
Ben
  • 5,952
  • 4
  • 33
  • 44
0

In order to deal with the case where there is no space around the url, you can use the string split method like this:

if ".com" in s:
    s=''.join((s.split("www.")[0], " ", s.split(".com")[1]))
Natecat
  • 2,175
  • 1
  • 17
  • 20
  • Your expression fails on sentences like: `The prefix www. often starts a url, while .com ends it` – Gerrat Mar 31 '16 at 02:40