0

I have a sentence like this Give me 4 of ABCD_X and then do something

I need to extract ABCD_X - any set of characters after Give me 4 of and before space. Number can be any size

I am able to do it with this expression (taken from this question):

(?<=^Give me \d of )(.*?)(?=\s) But the number can be 10 or greater, so (?<=^Give me \d+ of )(.*?)(?=\s) returns error in python (pandas column) that positive lookbehind should be fixed width.

Is there a way to avoid positive lookbehind to exract those characters?

tripleee
  • 175,061
  • 34
  • 275
  • 318
YevKad
  • 650
  • 6
  • 13

1 Answers1

3

You could try:

^Give me \d+ of (\S+)

See an online demo


  • ^ - Start line anchor.
  • Give me \d+ of - Literally your searchstring with 1+ digits.
  • (\S+) - A capture group with 1+ non-whitespace characters.

For example:

import pandas as pd
df = pd.Series(['Give me 4 of ABCD_X and then do something', 'Give me 10 of ABCD_Y and then do something'])
df = df.str.extract(r'^Give me \d+ of (\S+)')
print(df)

Prints:

   0
0  ABCD_X
1  ABCD_Y

Note: If you would use a named capture group, the column header will use the name of that group instead of the integer of the group.

JvdV
  • 70,606
  • 8
  • 39
  • 70
  • 1
    This works perfectly, thanks a lot! Especially for the detailed explanation and testing it on pandas! – YevKad Sep 21 '21 at 07:03