RegEx, characters between 2 strings without positive lookbehind

Question

I have a sentence like this Give me 4 of ABCD_X and then do something

I need to extract ABCD_X - any set of characters after Give me 4 of and before space. Number can be any size

I am able to do it with this expression (taken from this question):

(?<=^Give me \d of )(.*?)(?=\s) But the number can be 10 or greater, so (?<=^Give me \d+ of )(.*?)(?=\s) returns error in python (pandas column) that positive lookbehind should be fixed width.

Is there a way to avoid positive lookbehind to exract those characters?

Instead of a lookbehind, you could just use a non-capturing group ```(?:^Give me...)```. — sj95126, Sep 21 '21 at 05:54

JvdV · Accepted Answer · 2021-09-21T06:01:25.800

3

You could try:

^Give me \d+ of (\S+)

See an online demo

^ - Start line anchor.
Give me \d+ of - Literally your searchstring with 1+ digits.
(\S+) - A capture group with 1+ non-whitespace characters.

For example:

import pandas as pd
df = pd.Series(['Give me 4 of ABCD_X and then do something', 'Give me 10 of ABCD_Y and then do something'])
df = df.str.extract(r'^Give me \d+ of (\S+)')
print(df)

Prints:

   0
0  ABCD_X
1  ABCD_Y

Note: If you would use a named capture group, the column header will use the name of that group instead of the integer of the group.

edited Sep 21 '21 at 06:01

answered Sep 21 '21 at 05:56

JvdV

70,606
8
39
70

1

This works perfectly, thanks a lot! Especially for the detailed explanation and testing it on pandas! – YevKad Sep 21 '21 at 07:03

RegEx, characters between 2 strings without positive lookbehind

1 Answers1