27

Lets say I have:

a = r''' Example
This is a very annoying string
that takes up multiple lines
and h@s a// kind{s} of stupid symbols in it
ok String'''

I need a way to do a replace(or just delete) and text in between "This" and "ok" so that when I call it, a now equals:

a = "Example String"

I can't find any wildcards that seem to work. Any help is much appreciated.

cashman04
  • 1,134
  • 2
  • 13
  • 27

6 Answers6

22

You need Regular Expression:

>>> import re
>>> re.sub('\nThis.*?ok','',a, flags=re.DOTALL)
' Example String'
Kabie
  • 10,489
  • 1
  • 38
  • 45
10

Another method is to use string splits:

def replaceTextBetween(originalText, delimeterA, delimterB, replacementText):
    leadingText = originalText.split(delimeterA)[0]
    trailingText = originalText.split(delimterB)[1]

    return leadingText + delimeterA + replacementText + delimterB + trailingText

Limitations:

  • Does not check if the delimiters exist
  • Assumes that there are no duplicate delimiters
  • Assumes that delimiters are in correct order
Zachary Canann
  • 1,131
  • 2
  • 13
  • 23
7

Use re.sub : It replaces the text between two characters or symbols or strings with desired character or symbol or string.

format: re.sub('A?(.*?)B', P, Q, flags=re.DOTALL)
where 
A : character or symbol or string
B : character or symbol or string
P : character or symbol or string which replaces the text between A and B
Q : input string
re.DOTALL : to match across all lines
import re
re.sub('\nThis?(.*?)ok', '', a,  flags=re.DOTALL)

output : ' Example String'

Lets see an example with html code as input

input_string = '''<body> <h1>Heading</h1> <p>Paragraph</p><b>bold text</b></body>'''

Target : remove <p> tag

re.sub('<p>?(.*?)</p>', '', input_string,  flags=re.DOTALL)

output : '<body> <h1>Heading</h1> <b>bold text</b></body>'

Target : replace <p> tag with word : test

re.sub('<p>?(.*?)</p>', 'test', input_string,  flags=re.DOTALL)

otput : '<body> <h1>Heading</h1> test<b>bold text</b></body>'
Govinda
  • 789
  • 7
  • 6
5

The DOTALL flag is the key. Ordinarily, the '.' character doesn't match newlines, so you don't match across lines in a string. If you set the DOTALL flag, re will match '.*' across as many lines as it needs to.

faraday703
  • 141
  • 5
4
a=re.sub('This.*ok','',a,flags=re.DOTALL)
Vaughn Cato
  • 63,448
  • 5
  • 82
  • 132
1

If you want first and last words:

re.sub(r'^\s*(\w+).*?(\w+)$', r'\1 \2', a, flags=re.DOTALL)
JBernardo
  • 32,262
  • 10
  • 90
  • 115