Regular expression to extract first claim of patent text

Question

I am using Python to extract some data from the Patent Office. I would like to use a regular expression to extract the first claim from the claim text. The text string will begin with "1." include any number of letters, digits, symbols, up to "2." but not including the "2." What regular expression for Python would match the text from "1." up to but not including "2."?

I tried

p=re.compile(r"/.+?(?=2)/")

and then ran a search using that object against the text string but received "None".

score -1 · Accepted Answer · answered Aug 31 '23 at 16:53

-1

You're missing the . in your lookahead, so it will match any 2, not just 2.

If the text can be multiple lines, you'll need to use the re.DOTALL flag so .+ will match newlines.

You don't put / around regular expressions in Python.

p = re.compile(r'1\..+?(?=2\.)', flags=re.DOTALL)

DEMO

answered Aug 31 '23 at 16:53

Barmar

741,623
53
500
612

Awesome - thanks! I'm not new to coding, but new to Python, so that was a huge help! – Don Aug 31 '23 at 19:39

Regular expression to extract first claim of patent text

1 Answers1