Extract Numbers from a String Column that contains certain substrings

Question

I am looking to Extract Numbers in another Column from a column Work_Experience

**Work_Experience**  
3 Month  
7 month  
11 Months  
8 months  
0

and I am looking for following output:

**Work_Experience**  ||   **Output**  
3 Month                   3  
7 month                   7  
11 Months                 11  
8 months                  8  
0                         0

This is what I tried

Experian_P2['Output'] = Experian_P2['Work_Experience'].str.contains('|'.join(['month', 'Month','months','Months'])).str.extract("(\d{1,2})")

score 2 · Answer 1 · answered Jan 09 '23 at 12:15

2

You can use

df['Output'] = df['Work_Experience'].str.extract(r'(?i)(\d+)\s*month').fillna(0)

Pattern details:

(?i) - case insensitive match
(\d+) - Group 1: one or more digits
\s* - zero or more whitespaces
month - a month string.

answered Jan 09 '23 at 12:15

Wiktor Stribiżew

607,720
39
448
563

score 0 · Answer 2 · answered Jan 09 '23 at 12:13

0

if you know that first part will always be a number you can try this simple snippet

a = ['3 Month', '7 month', '11 Months', '8 months', '0']

for x in a:
    parts = x.split(' ')

    print (x, int(parts[0]))

answered Jan 09 '23 at 12:13

ashish singh

6,526
2
15
35

score 0 · Answer 3 · answered Jan 09 '23 at 12:15

Assuming pandas here.

You command doesn't really make sense as you are piping one command to generate a boolean Series, and then are trying to extract text from it:

Experian_P2['Work_Experience'].str.contains('|'.join(['month', 'Month','months','Months'])).str.extract("(\d{1,2})"

I think you want to slice and extract on this subset:

m = Experian_P2['Work_Experience'].str.contains('months?', case=False)

Experian_P2.loc[m, 'Output'] = Experian_P2.loc[m, 'Work_Experience'].str.extract("(\d{1,2})", expand=False)

Extract Numbers from a String Column that contains certain substrings

3 Answers3