2

I have a text file, and I need to extract everything from the file between two '*'s. There can be multiple occurrences of the same. How would I do that using Regex? I am good at Python, but I haven't used Regex a lot so its my weakness.

Varun Shah
  • 53
  • 1
  • 7

1 Answers1

5

Notes:

  1. Use the documentation, it's really helpful!
  2. * is normally a 0 or more pattern search, so you'll need to escape it with \
  3. . is an any search and will capture all characters except for newlines!. To include newlines, add the re.DOTALL flag
  4. + means at least one, and it is a greedy operator, meaning that it would normally capture everything between the first * and the last * (including any *'s in between), so to prevent it from being greedy, we add the ? operator, which tells it to stop at the first * it encounters.
  5. () Only matches within the parentheses are kept!

And here is an example of that in action:

import re
pattern = re.compile(r'\*(.+?)\*', flags=re.DOTALL)
text = """Why hello *there my fine
fellow!* How for art thou
on *such a glorious day?*"""

results = pattern.findall(text)
# ['there my fine\nfellow!', 'such a glorious day?']
Josha Inglis
  • 1,018
  • 11
  • 23