Split or partition string after certain words

Question

Let me start by saying I've googled extensively for quite a few hours before asking this here, and I'm quite desperate if I've chosen to post here.

I have a few strings with the following format (approximated):

"firstword text ONE lastword"
"firstword text TWO lastword"

I need to extract the text after the 'firstword' and before 'ONE' or 'TWO'.

So my output for the aforementioned strings would have to be:

"text"

How do I split or partition the string so I can:

remove the first word (I already know how to do this with str.split(' '))
retain the text which comes before any of the 'ONE' or 'TWO'. (I thought it was supposed to look something like str.split('ONE' | 'TWO'), but that obviously doesn't work and I haven't managed to find a solution right now.

If possible, I would like to solve it with split() or partition(), but regex would be fine as well.

Thank you for your help and sorry if this is a dumb question.

Possible duplicate of [find-string-between-two-substrings](https://stackoverflow.com/questions/3368969/find-string-between-two-substrings) — Mayank Porwal, Nov 19 '18 at 12:36

score 5 · Accepted Answer · answered Nov 19 '18 at 12:32

5

You can use this regex, which does a positive lookahead and positive lookbehind,

(?<=firstword)\s*(.*?)\s*(?=ONE|TWO)

Demo

Explanation:

(?<=firstword) --> Positive look behind to ensure the matched text is followed by firstword
\s* --> Eats any white space
(.*?) --> Captures your intended data
\s* --> Eats any white space
(?=ONE|TWO) --> Positive lookahead to ensure the matched text is followed by ONE or TWO

answered Nov 19 '18 at 12:32

Pushpesh Kumar Rajwanshi

18,127
2
19
36

1

This is indeed a good solution. I will accept it as the answer as it solved my specific query. It does leave me wondering how I would solve this with `split()` or `partition()`, though. Is it possible? – remus2232 Nov 19 '18 at 13:09

score 1 · Answer 2 · answered Nov 19 '18 at 12:37

1

When you split it with space you have a list of all the words then you can choose which word you want :

s = "firstword text TWO lastword"
l = s.split(" ") # l = ["firstword" , "text" , "TWO" , "lastword"]
print l[1] # l[1] = "text"

or

s = "firstword text TWO lastword"
print s.split(" ")[1]

answered Nov 19 '18 at 12:37

Ali Kargar

189
4

The problem with this is that my string can have any length after the `ONE` or `TWO`. I'm looking Remove everything that comes after the `ONE` or `TWO`, it might be 1 word or 10 words. Sorry for not being more specific. A more realistic example of the string I'm working with is `firstword text ONE extra text which needs to be deleted` – remus2232 Nov 19 '18 at 12:49

score 1 · Answer 3 · answered Nov 19 '18 at 12:54

Try This

str_list = ["firstword text ONE lastword","firstword text TWO lastword","any text u entered before firstword text ONE","firstword text TWO any text After"]
end_key_lst = ['ONE','TWO']
print map(lambda x:x.split('firstword')[-1].strip(),[''.join(val.split(end_key)[:-1]) for val in str_list for i,end_key in enumerate(end_key_lst) if end_key in val.split()])

Result:['text', 'text', 'text', 'text']

How i do this: May You have number of strings like those,So i kept them in list and Arrange Our End Keys like ONE,TWO in one list. I use list Compression and Map function to get our desired target list.

score 1 · Answer 4 · answered Nov 19 '18 at 12:57

1

You can use regex like:

import re
string = "firstword text TWO lastword"
re.search('firstword\s+(\w+)\s+[ONE|TWO]', string).group(1)
'text'

answered Nov 19 '18 at 12:57

Franco Piccolo

6,845
8
34
52

score 1 · Answer 5 · answered Nov 19 '18 at 13:14

Actually there's no need to use regex. You can store required separators into a list and then check if they exist.

orig_text = "firstword text ONE lastword"

first_separator = "firstword"
#Place all "end words" here
last_separators = ["ONE", "TWO"]

output = []

#Splitting the original text into list
orig_text = orig_text.split(" ")

#Checking if there's the "firstword" just in case
if first_separator in orig_text:
    #Here we check if there's "ONE" or "TWO" in the text
    for i in last_separators:
        if i in orig_text:
            #taking everything between "firstword" and "ONE"/"TWO"
            output = orig_text[orig_text.index(first_separator)+1 : orig_text.index(i)]
            break

#Converting to string
output = " ".join(output)

print(output)

Here's an example of outputs:

"firstword text TWO lastword" -> "text"
"firstword hello world ONE" -> "hello world"
"first text ONE" -> ""
"firstword text" -> ""

Split or partition string after certain words

5 Answers5