2

Let me start by saying I've googled extensively for quite a few hours before asking this here, and I'm quite desperate if I've chosen to post here.

I have a few strings with the following format (approximated):

"firstword text ONE lastword"
"firstword text TWO lastword"

I need to extract the text after the 'firstword' and before 'ONE' or 'TWO'.

So my output for the aforementioned strings would have to be:

"text"

How do I split or partition the string so I can:

  • remove the first word (I already know how to do this with str.split(' '))
  • retain the text which comes before any of the 'ONE' or 'TWO'. (I thought it was supposed to look something like str.split('ONE' | 'TWO'), but that obviously doesn't work and I haven't managed to find a solution right now.

If possible, I would like to solve it with split() or partition(), but regex would be fine as well.

Thank you for your help and sorry if this is a dumb question.

remus2232
  • 77
  • 8
  • 1
    Possible duplicate of [find-string-between-two-substrings](https://stackoverflow.com/questions/3368969/find-string-between-two-substrings) – Mayank Porwal Nov 19 '18 at 12:36

5 Answers5

5

You can use this regex, which does a positive lookahead and positive lookbehind,

(?<=firstword)\s*(.*?)\s*(?=ONE|TWO)

Demo

Explanation:

  • (?<=firstword) --> Positive look behind to ensure the matched text is followed by firstword
  • \s* --> Eats any white space
  • (.*?) --> Captures your intended data
  • \s* --> Eats any white space
  • (?=ONE|TWO) --> Positive lookahead to ensure the matched text is followed by ONE or TWO
Pushpesh Kumar Rajwanshi
  • 18,127
  • 2
  • 19
  • 36
  • 1
    This is indeed a good solution. I will accept it as the answer as it solved my specific query. It does leave me wondering how I would solve this with `split()` or `partition()`, though. Is it possible? – remus2232 Nov 19 '18 at 13:09
1

When you split it with space you have a list of all the words then you can choose which word you want :

s = "firstword text TWO lastword"
l = s.split(" ") # l = ["firstword" , "text" , "TWO" , "lastword"]
print l[1] # l[1] = "text"

or

s = "firstword text TWO lastword"
print s.split(" ")[1]
Ali Kargar
  • 189
  • 4
  • The problem with this is that my string can have any length after the `ONE` or `TWO`. I'm looking Remove everything that comes after the `ONE` or `TWO`, it might be 1 word or 10 words. Sorry for not being more specific. A more realistic example of the string I'm working with is `firstword text ONE extra text which needs to be deleted` – remus2232 Nov 19 '18 at 12:49
1

Try This

str_list = ["firstword text ONE lastword","firstword text TWO lastword","any text u entered before firstword text ONE","firstword text TWO any text After"]
end_key_lst = ['ONE','TWO']
print map(lambda x:x.split('firstword')[-1].strip(),[''.join(val.split(end_key)[:-1]) for val in str_list for i,end_key in enumerate(end_key_lst) if end_key in val.split()])

Result:['text', 'text', 'text', 'text']

How i do this: May You have number of strings like those,So i kept them in list and Arrange Our End Keys like ONE,TWO in one list. I use list Compression and Map function to get our desired target list.

Narendra Lucky
  • 340
  • 2
  • 13
1

You can use regex like:

import re
string = "firstword text TWO lastword"
re.search('firstword\s+(\w+)\s+[ONE|TWO]', string).group(1)
'text'
Franco Piccolo
  • 6,845
  • 8
  • 34
  • 52
1

Actually there's no need to use regex. You can store required separators into a list and then check if they exist.

orig_text = "firstword text ONE lastword"

first_separator = "firstword"
#Place all "end words" here
last_separators = ["ONE", "TWO"]

output = []

#Splitting the original text into list
orig_text = orig_text.split(" ")

#Checking if there's the "firstword" just in case
if first_separator in orig_text:
    #Here we check if there's "ONE" or "TWO" in the text
    for i in last_separators:
        if i in orig_text:
            #taking everything between "firstword" and "ONE"/"TWO"
            output = orig_text[orig_text.index(first_separator)+1 : orig_text.index(i)]
            break

#Converting to string
output = " ".join(output)

print(output)

Here's an example of outputs:

"firstword text TWO lastword" -> "text"
"firstword hello world ONE" -> "hello world"
"first text ONE" -> ""
"firstword text" -> ""
OSA413
  • 387
  • 2
  • 4
  • 16