0

I have a DataFrame that looks like this:

df = pd.DataFrame({
    'name': ['John','Mary', 'Phil', 'Sue', 'Robert', 'Lucy', 'Blake'],
    'age': ['15', '20s', 37, 'teen', '', 'elderly', 57]
    })

df


    name     age
0   John     15          
1   Mary     20s
2   Phil     37
3   Sue      teen
4   Robert  
5   Lucy     elderly
6   Blake    57

I would like to:

  1. convert the age column into integers (where there is an integer already, or where one is able to be deduced, e.g. from a string)
  2. otherwise replace with NaN

Here is what I'm looking to get:

name         age
0   John     15            <--- was originally a string
1   Mary     NaN
2   Phil     37
3   Sue      NaN
4   Robert   NaN
5   Lucy     NaN
6   Blake    57

How would I do this?

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
equanimity
  • 2,371
  • 3
  • 29
  • 53

2 Answers2

0

You could do the following, using the apply method.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name': ['John','Mary', 'Phil', 'Sue', 'Robert', 'Lucy', 'Blake'],
    'age': ['15', '20s', 37, 'teen', '', 'elderly', 57]
    })

def func(x):
    try:
        return int(x)
    except:
        return np.NA

df['age'] = df['age'].apply(func)
Ben Grossmann
  • 4,387
  • 1
  • 12
  • 16
0

We have a pd.to_numeric then convert with 'Int64' , usually NaN and int could not be mixed.

df['new'] = pd.to_numeric(df['age'],errors = 'coerce').astype('Int64')
df
Out[26]: 
     name      age   new
0    John       15    15
1    Mary      20s  <NA>
2    Phil       37    37
3     Sue     teen  <NA>
4  Robert           <NA>
5    Lucy  elderly  <NA>
6   Blake       57    57
BENY
  • 317,841
  • 20
  • 164
  • 234