-1

I am looking to Extract Numbers in another Column from a column Work_Experience

**Work_Experience**  
3 Month  
7 month  
11 Months  
8 months  
0

and I am looking for following output:

**Work_Experience**  ||   **Output**  
3 Month                   3  
7 month                   7  
11 Months                 11  
8 months                  8  
0                         0  

This is what I tried

Experian_P2['Output'] = Experian_P2['Work_Experience'].str.contains('|'.join(['month', 'Month','months','Months'])).str.extract("(\d{1,2})")
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563

3 Answers3

2

You can use

df['Output'] = df['Work_Experience'].str.extract(r'(?i)(\d+)\s*month').fillna(0)

Pattern details:

  • (?i) - case insensitive match
  • (\d+) - Group 1: one or more digits
  • \s* - zero or more whitespaces
  • month - a month string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

if you know that first part will always be a number you can try this simple snippet

a = ['3 Month', '7 month', '11 Months', '8 months', '0']

for x in a:
    parts = x.split(' ')

    print (x, int(parts[0]))
ashish singh
  • 6,526
  • 2
  • 15
  • 35
0

Assuming here.

You command doesn't really make sense as you are piping one command to generate a boolean Series, and then are trying to extract text from it:

Experian_P2['Work_Experience'].str.contains('|'.join(['month', 'Month','months','Months'])).str.extract("(\d{1,2})"

I think you want to slice and extract on this subset:

m = Experian_P2['Work_Experience'].str.contains('months?', case=False)

Experian_P2.loc[m, 'Output'] = Experian_P2.loc[m, 'Work_Experience'].str.extract("(\d{1,2})", expand=False)
mozway
  • 194,879
  • 13
  • 39
  • 75