Seems like you could achieve that easily without a regex:
string = 'X asdas asdasda, asdasdas asdasda, asdasdasas, asdasddas.'
result = string.lstrip('X ').rstrip('.').split(', ')
should do what you want. Result:
['asdas asdasda', 'asdasdas asdasda', 'asdasdasas', 'asdasddas']
You can also shorten this to
result = string.strip('X .').split(', ')
but this will remove the given characters from both ends of the string.
If you have your whole text in one multi-line string, you can still do it in one line with list comprehension:
text = '''X abc, abd.
X abc, abd, abcd.
X abc abd, abc.
X asdas, asdasd, adsasda, asdasda.
X asdas asdasda, asdasdas asdasda, asdasdasas, asdasddas.'''
result = [t.strip('X .').split(', ') for t in text.splitlines()]
Result:
[['abc', 'abd'],
['abc', 'abd', 'abcd'],
['abc abd', 'abc'],
['asdas', 'asdasd', 'adsasda', 'asdasda'],
['asdas asdasda', 'asdasdas asdasda', 'asdasdasas', 'asdasddas']
]
Please note: This only works if the characters in X
and .
are different from the characters at the start respectively end of the string you want to keep. This is because strip
doesn't mean "remove this substring from the ends of the string", but instead "remove any characters from the given set of characters from the ends of the string".
If your pattern in front e.g. looked like this
line = 'asdasX asdas asdasda, asdasdas asdasda, asdasdasas, asdasddas.'
the above approach would not work.
Instead you could also trim the sting based on the length of your pattern, optionally after verifying that it in fact starts and ends with the patterns you are looking for:
if line.startswith('asdasX') and line.endswith('.'):
result = line[7:-1].split(', ')
Result:
['asdas asdasda', 'asdasdas asdasda', 'asdasdasas', 'asdasddas']
or again, as list comprehension:
text = '''asdasX abc, abd.
asdasX abc, abd, abcd.
asdasX abc abd, abc.
asdasX asdas, asdasd, adsasda, asdasda.
asdasX asdas asdasda, asdasdas asdasda, asdasdasas, asdasddas.'''
result = [t[7:-1].split(', ') for t in text.splitlines() if t.startswith('asdasX') and t.endswith('.')]
Result:
[['abc', 'abd'],
['abc', 'abd', 'abcd'],
['abc abd', 'abc'],
['asdas', 'asdasd', 'adsasda', 'asdasda'],
['asdas asdasda', 'asdasdas asdasda', 'asdasdasas', 'asdasddas']
]
On Python 3.9 and newer you can use the removeprefix
and removesuffix
methods to remove an entire substring from either side of the string:
result = [t.removeprefix('asdasX').removesuffix('.').split(', ') for t in text.splitlines()]
See this SO post.