-1

I am using Python to extract some data from the Patent Office. I would like to use a regular expression to extract the first claim from the claim text. The text string will begin with "1." include any number of letters, digits, symbols, up to "2." but not including the "2." What regular expression for Python would match the text from "1." up to but not including "2."?

I tried

p=re.compile(r"/.+?(?=2)/")

and then ran a search using that object against the text string but received "None".

Barmar
  • 741,623
  • 53
  • 500
  • 612
Don
  • 3
  • 1

1 Answers1

-1

You're missing the . in your lookahead, so it will match any 2, not just 2.

If the text can be multiple lines, you'll need to use the re.DOTALL flag so .+ will match newlines.

You don't put / around regular expressions in Python.

p = re.compile(r'1\..+?(?=2\.)', flags=re.DOTALL)

DEMO

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Awesome - thanks! I'm not new to coding, but new to Python, so that was a huge help! – Don Aug 31 '23 at 19:39