-2

I am cleaning my data from urls I tried:

s = 'hello http://www.google.com I am william http://www.google.com'

from urlparse import urlparse

s.split()

clean = ' '.join([el for el in [i for i in s.split()] if not urlparse(el).scheme])

print(clean)

desired output:

hello I am william

However this time I would like to achieve the same output using instead a regular expression.

Cœur
  • 37,241
  • 25
  • 195
  • 267
neo33
  • 1,809
  • 5
  • 18
  • 41
  • This is an awkward issue. See https://mathiasbynens.be/demo/url-regex for some attempts at perfect url regexes. If you know your urls will always have a certain format, this problem will be greatly simplified. – Jared Goguen Jan 11 '17 at 18:48
  • Check [here](http://stackoverflow.com/questions/6718633/python-regular-expression-again-match-url) and [here](http://stackoverflow.com/questions/6883049/regex-to-find-urls-in-string-in-python) and also [here](http://stackoverflow.com/questions/520031/whats-the-cleanest-way-to-extract-urls-from-a-string-using-python) – yorodm Jan 11 '17 at 18:50
  • https://regex101.com/ is a decent online, python flavored, regex tester . – wwii Jan 11 '17 at 20:46

1 Answers1

4

use replacement

import re

s = 'hello http://www.google.com I am william http://www.google.com'
print(re.sub('http\S+\s?', '', s))

Prints

hello I am william
mike.k
  • 3,277
  • 1
  • 12
  • 18