Python regex tested correctly but not giving expected result

Question

I am trying to extract everything between the number of year and the word "vehicles".

I tested the below regex on regexr, and it highlighted the expected phrase I wanted to extract(shown below in italics)

(?<=[1-2][0-9]{3}\s)(.+)(?=\svehicles)

"2023 Civic Type R vehicles. The driver's seat frame is wrong."

So I used the below code to extract it into a new column:

df['newcol'] = df['colA'].str.extract(r'(?<=[1-2][0-9]{3}\s)(.+)(?=\svehicles)', expand=False)

However, this is giving me the full sentence as the result in my new col instead of just Civic Type R. What am I doing wrong and why the different outputs between regexr & jupyter lab?

Update:

I found that it is giving me this problem because there is another instance of "vehicles" further down the sentence. I wasn't aware of that.

How can I modify my regex to only capture until the first instance of the word?

Thanks

Works as expected for me and returned `Civic Type R` Pandas 1.5.2, Python 3.10.9, Jupyter Lab 3.5.2 — It_is_Chris, Apr 18 '23 at 16:01
@It_is_Chris I closed jupyter lab and re-ran my file. Still not working. Python 3.10.9, Pandas 1.5.3, Jupyter Lab 3.5.3. It is giving me expected results for some rows and not others. — Sid_J, Apr 18 '23 at 16:10
Tried with package re instead of pandas. You can try it as well. ran this: `import re; re.search(r'(?<=[1-2][0-9]{3}\s)(.+)(?=\svehicles)', "2023 Civic Type R vehicles. The driver's seat frame is wrong.").groups()[0]` get this: `'Civic Type R'`. Tested on python 3.8.0. — Yossi Levi, Apr 18 '23 at 16:11

score 0 · Answer 1 · answered Apr 18 '23 at 17:23

0

Thanks to this question I was able to find the answer. I needed to do a lazy match so it'd stop after the first instance of the word "vehicles".

answered Apr 18 '23 at 17:23

Sid_J

1
2

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 19 '23 at 06:39

Python regex tested correctly but not giving expected result

1 Answers1