5

This is probably one of those simple things that I am missing, but I have not been able to find a solution that would solve my issue.

I have two strings that are in the following format:

s1 = '87, 72 Start I am a sentence finish'
s2 = '93, 83 Start I am a sentence end'

Following this answer, Replace all text between 2 strings python, I am able to replace a phrase when given a start and end word, as the following.

import re
s1 = '87, 72 Start I am a sentence finish'
s2 = '93, 83 Start I am a sentence end'

print(re.sub("Start.*?finish", '', s1, re.DOTALL).strip())
print(re.sub("Start.*?end", '', s2, re.DOTALL).strip())

>>> 87, 72
>>> 93, 83

In my case, I will have conditions where the starting word is the same, but the ending word could be different.

Is it possible to replace the desired phrase by providing only the starting word?

I have tried this, but it only replaces the starting word.

s1 = '87, 72 Start I am a sentence finish'
print(re.sub("Start.*?", '', v1, re.DOTALL).strip())

>>> 87, 72 I am a sentence finish
Community
  • 1
  • 1
Wondercricket
  • 7,651
  • 2
  • 39
  • 58

4 Answers4

4

Use an end of line anchor $ and greedy matching .*:

print(re.sub("Start.*$", '', v1, re.DOTALL).strip())

See demo

Sample code:

import re
p = re.compile(ur'Start.*$')
test_str = u"87, 72 Start I am a sentence finish"
result = re.sub(p, "", test_str).strip()
print result

Output:

87, 72
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
2

You can use "$" to match the "end of line", so "Start.*$" should do it.

Buddy
  • 10,874
  • 5
  • 41
  • 58
2

Also.. you can just remove ? (non greedy) in your regex.. it will match till end by default.. (greedy and no need to use $ here)

print(re.sub("Start.*", '', v1, re.DOTALL).strip())

See DEMO

Input:

'87, 72 Start I am a sentence finish'

Output:

>>> 87, 72
karthik manchala
  • 13,492
  • 1
  • 31
  • 55
1

If you just need the numbers in the beginning of the string, you can use:

s1 = '87, 72 Start I am a sentence finish'
print(re.sub(" Start.*$", '', s1))

Output:

87, 72

Regex explanation:

 Start.*$

Match the character string “ Start” literally « Start»
Match any single character that is NOT a line break character «.*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the end of the string, or before the line break at the end of the string, if any «$»

Regex Demo:

https://regex101.com/r/gV9kJ6/1


Python Demo:

http://ideone.com/XU02Gf

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268