0

I am trying to extract everything between the number of year and the word "vehicles".

I tested the below regex on regexr, and it highlighted the expected phrase I wanted to extract(shown below in italics)

(?<=[1-2][0-9]{3}\s)(.+)(?=\svehicles)

"2023 Civic Type R vehicles. The driver's seat frame is wrong."

So I used the below code to extract it into a new column:

df['newcol'] = df['colA'].str.extract(r'(?<=[1-2][0-9]{3}\s)(.+)(?=\svehicles)', expand=False)

However, this is giving me the full sentence as the result in my new col instead of just Civic Type R. What am I doing wrong and why the different outputs between regexr & jupyter lab?

Update:

I found that it is giving me this problem because there is another instance of "vehicles" further down the sentence. I wasn't aware of that.

How can I modify my regex to only capture until the first instance of the word?

Thanks

Sid_J
  • 1
  • 2
  • Works as expected for me and returned `Civic Type R` Pandas 1.5.2, Python 3.10.9, Jupyter Lab 3.5.2 – It_is_Chris Apr 18 '23 at 16:01
  • Works for me as well (Python 3.8.13, pandas 1.2.4) – Seb Apr 18 '23 at 16:07
  • @It_is_Chris I closed jupyter lab and re-ran my file. Still not working. Python 3.10.9, Pandas 1.5.3, Jupyter Lab 3.5.3. It is giving me expected results for some rows and not others. – Sid_J Apr 18 '23 at 16:10
  • Tried with package re instead of pandas. You can try it as well. ran this: `import re; re.search(r'(?<=[1-2][0-9]{3}\s)(.+)(?=\svehicles)', "2023 Civic Type R vehicles. The driver's seat frame is wrong.").groups()[0]` get this: `'Civic Type R'`. Tested on python 3.8.0. – Yossi Levi Apr 18 '23 at 16:11
  • Please add the regex to code block as well. – Hasnat Apr 18 '23 at 18:59

1 Answers1

0

Thanks to this question I was able to find the answer. I needed to do a lazy match so it'd stop after the first instance of the word "vehicles".

Sid_J
  • 1
  • 2
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 19 '23 at 06:39