Replace a phrase only if it appears at the beginning of a character string

Question

For example, must remove in this order "the ", "and ", "a ", "an ", "this " or "that " , only if they are at the beginning of the string:

input ---> "the computer is the machine in charge of data processing processes"

output ---> "computer is the machine in charge of data processing processes"

It is important that if I manage to find that the sentence begins with one of those words, that I remove it and then do not continue trying to remove the others. In the case of this example, it would detect the word "the " at the beginning of the string, remove it, and no longer try the rest of the words.

To reach the conclusion that you should not remove anything, you have to yes or if you have tried removing all 6 options ("the ", "and ", "a ", "an ", "this " or "that "), and if you did not find that the input phrase begins with any of those options, then assume that you should not remove anything.

I've tried something like this, but the problem is that it would do all the checking and not just try to find until the match.

input_phrase.replace("the ","")

input_phrase = "An airplane is an aircraft with a higher density than the air."
input_phrase = input_phrase.lower()

input_phrase = input_phrase.replace("the ","",1)
input_phrase = input_phrase.replace("and ","",1)
input_phrase = input_phrase.replace("a ","",1)
input_phrase = input_phrase.replace("an ","",1)
input_phrase = input_phrase.replace("this ","",1)
input_phrase = input_phrase.replace("that ","",1)

output_phrase = input_phrase

print(repr(output_phrase))

The problem with that code is that it doesn't just remove the word if it's at the beginning, but it removes the first occurrence, and also uses all .remove() and not stops when it has already removed one of the matches.

Not sure about this. Not sure what he wants to have happen with the string "the and this that ". Do they only want one removed, or all of them? Must they be found in the order indicated? — Frank Yellin, Mar 15 '22 at 19:04
@FrankYellin "and then do not continue trying to remove the others" — Kelly Bundy, Mar 15 '22 at 19:05
@FrankYellin Although that just rules out the *others*. Technically unclear what they want for ["The The are an English post-punk band"](https://en.wikipedia.org/wiki/The_The) :-). — Kelly Bundy, Mar 15 '22 at 19:11
The "This question already has answers here" article linked to above clarifies regular expressions but it doesn't provide complete code or alternatives to regex. — Captain Caveman, Mar 15 '22 at 20:10

Cubix48 · Accepted Answer · 2022-03-15T19:20:13.320

5

Here is one way to do so using regex:

import re

input_phrase = "An airplane is an aircraft with a higher density than the air."
output_phrase = re.sub(r"^(the|and|a|an|this|that) ", '', input_phrase, flags=re.IGNORECASE)
print(output_phrase)

The re.ignorecase flag allows both An and an to work.
^ is used to assert the position at the beginning of the string.

Without regex, you can use startswith() and loop through keywords.

input_phrase = "An airplane is an aircraft with a higher density than the air."
keywords = ["the ", "and ", "a ", "an ", "this ", "that "]

output_phrase = input_phrase
for word in keywords:
    if input_phrase.lower().startswith(word):
        output_phrase = input_phrase[len(word):]
        break
print(output_phrase)

break is used to exit the for loop in order not to waste time checking other words.

edited Mar 15 '22 at 19:20

answered Mar 15 '22 at 19:04

Cubix48

2,607
2
5
17

`removeprefix` doesn't deal with case. – Frank Yellin Mar 15 '22 at 19:07
@FrankYellin True, but it doesn't work for Python 3.8 and earlier. – Cubix48 Mar 15 '22 at 19:11
@Cubix48 Thank you very much, I was doing some tests, and you helped me a lot with your answer. Regards. – Mar 16 '22 at 00:45

Captain Caveman · Answer 2 · 2022-03-15T19:57:32.403

1

input_phrase = "An airplane is an aircraft with a higher density than the air.".lower()

output_phrase = ''

words = ["the", "and ", "a ", "an ", "this", "that"]

if list(filter(input_phrase.startswith, words)) != []:
    input_phrase = input_phrase.split()
    input_phrase = input_phrase[1:]

for word in input_phrase:
    output_phrase += ' ' + word

print(output_phrase)

edited Mar 15 '22 at 19:57

answered Mar 15 '22 at 19:25

Captain Caveman

1,448
1
11
24

Replace a phrase only if it appears at the beginning of a character string

2 Answers2