Segmentation of a string before a curtain substring

Question

I want to segment a string into multiple strings before a curtain substring. Example:

string = 'Hello this is a text Hello this is another text Hello this is yet another text'

Segment before every 'Hello' so that

string_1 = 'Hello this is a text'
string_2 = 'Hello this is another text'
string_3 = 'Hello this is yet another text'

Using string.split('Hello') removes the 'Hello's from the strings, which I don't want. Does anybody have an idea?

You might wann check regex and lookaheads specifically – Tranbi Aug 10 '21 at 13:55 — Tranbi, Aug 10 '21 at 13:55

score 2 · Accepted Answer · answered Aug 10 '21 at 14:02

You can achieve that with the regular expression counterpart of split(), re.split() (plus some post-processing). The regex '(?=Hello)' matches at positions followed by the string Hello, without matching the Hello itself.

>>> import re
>>> re.split('(?=Hello)', string)
['',
 'Hello this is a text ',
 'Hello this is another text ',
 'Hello this is yet another text']

To get exactly your result, you can extend that to [x.strip() for x in re.split('(?=Hello)', string) if x.strip()].

score 0 · Answer 2 · answered Aug 10 '21 at 14:02

Try this:

string = 'Hello this is a text Hello this is another text Hello this is yet another text'
import re
x=[m.start() for m in re.finditer('Hello', string)]
print(string[x[0]:x[0+1]])
j= [string[i:j] for i,j in zip(x, x[1:]+[None])]
print(j)

Few lines where referred from:

Splitting a string by list of indices

How to find all occurrences of a substring?

Output:

['Hello this is a text ', 'Hello this is another text ', 'Hello this is yet another text']

score 0 · Answer 3 · answered Aug 10 '21 at 14:12

If you don't want to use regex, the following would suffice:

string = 'Hello this is a text Hello this is another text Hello this is yet another text'

substr = 'Hello'

result = [substr + s for s in string.split(substr) if s.strip()]

print(result)

The result is:

['Hello this is a text ', 'Hello this is another text ', 'Hello this is yet another text']

Of course, if you want to get rid of the space at the end of each part, you could do instead:

result = [(substr + s).strip() for s in string.split(substr) if s.strip()]

So that the result will be:

['Hello this is a text', 'Hello this is another text', 'Hello this is yet another text']

Segmentation of a string before a curtain substring

3 Answers3