-1

I have a file with a lot of text. Unfortunately, there are also some URLs with some spaces before or after the points. Example: http://www .test27d .com/site1

How can I replace these spaces such that only URLs are corrected (and not the other text, because sometimes, there has to be a space before or after a point ).

StMan
  • 111
  • 2
  • What is the correct URL here: `http://www.test27d` or `http://www.test27d.com/site1`? Both are valid. –  Oct 26 '18 at 13:27
  • 2
    It is impossible since there is no way to detect URLs if they have spaces in it... I think you should correct it by hand in a text editor – Mael Galliffet Oct 26 '18 at 13:29
  • Possible duplicate of [Remove all whitespace in a string in Python](https://stackoverflow.com/questions/8270092/remove-all-whitespace-in-a-string-in-python) – VinothRaja Oct 26 '18 at 13:29
  • 1
    @VinothRaja No, not all. Read the question. –  Oct 26 '18 at 13:30
  • How does other text looks like? – mad_ Oct 26 '18 at 13:31
  • Depending on the text, looking at your example, maybe replacing a space followed by a dot may work: ie `text.replace(' .','.')` does it help? – DSLima90 Oct 26 '18 at 13:36

2 Answers2

2

Find all the string matching the url criteria starting with http and then translate by removing spaces

import re
a='http://www .test27d .com/site1'
for i in re.findall('(^http://[\w\s\.\/]*)',a):
    print(i.translate(None,' '))

For testing

list_with_statements=['http://www .test27d .com/site1', 'string_with_no_spaces', 'string has spaces']
new_list=[]
for stat in list_with_statements:
    if re.search('(^http://[\w\s\.\/]*)',stat): # can also use str.startswith()
        stat=i.translate(None,' ')
    new_list.append(stat)

without regex

list_with_statements=['http://www .test27d .com/site1', 'string_with_no_spaces', 'string has spaces .']
new_list=[]
for stat in list_with_statements:
    if stat.startswith('http'):
        stat=i.translate(None,' ')
    new_list.append(stat)
print(new_list)

Outputs

['http://www.test27d.com/site1', 'string_with_no_spaces', 'string has spaces']
mad_
  • 8,121
  • 2
  • 25
  • 40
0

Try this:

newstring = string.replace(' ', '')
Chris Fowl
  • 488
  • 4
  • 16