How to convert a column of ints, NaN, and strings to only integers and NaNs

Question

I have a DataFrame that looks like this:

df = pd.DataFrame({
    'name': ['John','Mary', 'Phil', 'Sue', 'Robert', 'Lucy', 'Blake'],
    'age': ['15', '20s', 37, 'teen', '', 'elderly', 57]
    })

df


    name     age
0   John     15          
1   Mary     20s
2   Phil     37
3   Sue      teen
4   Robert  
5   Lucy     elderly
6   Blake    57

I would like to:

convert the age column into integers (where there is an integer already, or where one is able to be deduced, e.g. from a string)
otherwise replace with NaN

Here is what I'm looking to get:

name         age
0   John     15            <--- was originally a string
1   Mary     NaN
2   Phil     37
3   Sue      NaN
4   Robert   NaN
5   Lucy     NaN
6   Blake    57

How would I do this?

Ben Grossmann · Answer 1 · 2022-11-02T01:49:17.243

0

You could do the following, using the apply method.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name': ['John','Mary', 'Phil', 'Sue', 'Robert', 'Lucy', 'Blake'],
    'age': ['15', '20s', 37, 'teen', '', 'elderly', 57]
    })

def func(x):
    try:
        return int(x)
    except:
        return np.NA

df['age'] = df['age'].apply(func)

edited Nov 02 '22 at 01:49

answered Nov 02 '22 at 01:41

Ben Grossmann

4,387
1
12
16

score 0 · Answer 2 · answered Nov 02 '22 at 01:49

We have a pd.to_numeric then convert with 'Int64' , usually NaN and int could not be mixed.

df['new'] = pd.to_numeric(df['age'],errors = 'coerce').astype('Int64')
df
Out[26]: 
     name      age   new
0    John       15    15
1    Mary      20s  <NA>
2    Phil       37    37
3     Sue     teen  <NA>
4  Robert           <NA>
5    Lucy  elderly  <NA>
6   Blake       57    57

How to convert a column of ints, NaN, and strings to only integers and NaNs

2 Answers2