-3

I have a data frame with a column like below

Years in current job
< 1 year
10+ years
9 years
1 year

I want to use regex or any other technique in python to get the result as

Years in current job
1
10
9
1

I got something like this but, i guess it can be done in a better way using regex

frame["Years in current job"] = frame["Years in current job"].str.replace(" ","")
frame["Years in current job"] = frame["Years in current job"].str.replace("<","")
frame["Years in current job"] = frame["Years in current job"].str.replace("year","")
frame["Years in current job"] = frame["Years in current job"].str.replace("years","")
billboard
  • 785
  • 4
  • 13
  • 25
  • Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation. [on topic](http://stackoverflow.com/help/on-topic) and [how to ask](http://stackoverflow.com/help/how-to-ask) apply here. StackOverflow is not a coding or tutorial service. – Prune Oct 06 '16 at 23:40

2 Answers2

1
df['Years in current job'] = df['Years in current job'].str.replace('\D+', '').astype('int')

Regex \D+ search non-digits (and replace with empty string)


I found this on SO: https://stackoverflow.com/a/22591024/1832058

Community
  • 1
  • 1
furas
  • 134,197
  • 12
  • 106
  • 148
0
import re

def extract_nums(txt):
  try:
    return int(re.search('([0-9]+)', txt).group(1))
  except:
    return -1

df['Years in current job'] = df['Years in current job'].apply(extract_nums)

EDIT - adding context per suggestion below

this could be done easily enough with string methods, but I'll throw out an approach using regex since that might be helpful for more complicated tasks.

re.search and parenthesis will find the digits you're looking.... group extracts the match inside the parenthesis... and try/except will handle any problems that arise if there is no match. then just pass that function to the pandas.Series apply() method.

regex search: https://docs.python.org/2/library/re.html#regular-expression-objects

apply method: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html

kmh
  • 1,516
  • 17
  • 33
  • While this code snippet may answer the question, it doesn't provide any context to explain how or why. Consider adding a sentence or two to explain your answer. – brandonscript Oct 07 '16 at 03:24