I have a text file, and I need to extract everything from the file between two '*'s. There can be multiple occurrences of the same. How would I do that using Regex? I am good at Python, but I haven't used Regex a lot so its my weakness.
Asked
Active
Viewed 6,053 times
2
-
I have tried quite a few variants but couldn't make it work. As I said, I'm not good at Regex so couldn't come up with anything substantial. – Varun Shah Nov 16 '13 at 09:26
-
It's okay. Josha has answered beautifully well, given the documentation links & the explanation of the pattern too. :) – shad0w_wa1k3r Nov 16 '13 at 09:30
-
Yupe. Thanks a lot for the help, guys! Cheers to SO :) – Varun Shah Nov 19 '13 at 02:21
1 Answers
5
Notes:
- Use the documentation, it's really helpful!
*
is normally a0 or more
pattern search, so you'll need to escape it with\
.
is anany
search and will capture all characters except for newlines!. To include newlines, add there.DOTALL
flag+
meansat least one
, and it is a greedy operator, meaning that it would normally capture everything between the first*
and the last*
(including any*
's in between), so to prevent it from being greedy, we add the?
operator, which tells it to stop at the first*
it encounters.()
Only matches within the parentheses are kept!
And here is an example of that in action:
import re
pattern = re.compile(r'\*(.+?)\*', flags=re.DOTALL)
text = """Why hello *there my fine
fellow!* How for art thou
on *such a glorious day?*"""
results = pattern.findall(text)
# ['there my fine\nfellow!', 'such a glorious day?']

Josha Inglis
- 1,018
- 11
- 23
-
I am not sure whether the output for `text = '*one***two*'` is really the desired result. – Hyperboreus Nov 16 '13 at 09:05