I have a pandas dataframe. One of my columns should only be floats. When I try to convert that column to floats, I'm alerted that there are strings in there. I'd like to delete all rows where values in this column are strings...
4 Answers
Use convert_objects
with param convert_numeric=True
this will coerce any non numeric values to NaN
:
In [24]:
df = pd.DataFrame({'a': [0.1,0.5,'jasdh', 9.0]})
df
Out[24]:
a
0 0.1
1 0.5
2 jasdh
3 9
In [27]:
df.convert_objects(convert_numeric=True)
Out[27]:
a
0 0.1
1 0.5
2 NaN
3 9.0
In [29]:
You can then drop them:
df.convert_objects(convert_numeric=True).dropna()
Out[29]:
a
0 0.1
1 0.5
3 9.0
UPDATE
Since version 0.17.0
this method is now deprecated and you need to use to_numeric
unfortunately this operates on a Series
rather than a whole df so the equivalent code is now:
df.apply(lambda x: pd.to_numeric(x, errors='coerce')).dropna()

- 376,765
- 198
- 813
- 562
-
Thanks for this! My dataframe has multiple columns. Some columns need to have strings. For instance, I have a column 'name' and a column 'age'. The column 'age' needs to be numeric. I tried: df.age.convert_objects(convert_numeric=True) and got 'Series' object has no attribute 'convert_objects'. – porteclefs Nov 06 '14 at 17:03
-
You need to do `df[['age']].convert_objects(convert_numeric=True)` in that case – EdChum Nov 06 '14 at 17:04
-
Oh I see, so [['age']] picks out a the column in df. Very helpful. However, I'm getting a TypeError: convert_objects() got an unexpected keyword argument 'convert_numeric. I just checked the documentation and 'convert_numeric = True' is the correct argument. Thoughts? – porteclefs Nov 06 '14 at 17:14
-
Okay, I think that my pandas is out of date. Updating now. – porteclefs Nov 06 '14 at 17:25
-
Hi. I get a 'convert_objects deprecated' FutureWarning when trying to use this. Any suggestions? – magicsword Nov 06 '17 at 19:50
-
@magicsword that was deprecated some time ago `pandas` moves quickly, it's recommended to use `pd.to_numeric` nowadays so the above becomes `df.apply(lambda x: pd.to_numeric(x, errors='coerce')).dropna()` – EdChum Nov 06 '17 at 19:56
One of my columns should only be floats. I'd like to delete all rows where values in this column are strings
You can convert your series to numeric via pd.to_numeric
and then use pd.Series.notnull
. Conversion to float
is required as a separate step to avoid your series reverting to object
dtype.
# Data from @EdChum
df = pd.DataFrame({'a': [0.1, 0.5, 'jasdh', 9.0]})
res = df[pd.to_numeric(df['a'], errors='coerce').notnull()]
res['a'] = res['a'].astype(float)
print(res)
a
0 0.1
1 0.5
3 9.0

- 159,742
- 34
- 281
- 339
Assume your data frame is df
and you wanted to ensure that all data in one of the column of your data frame is numeric in specific pandas dtype
, e.g float
:
df[df.columns[n]] = df[df.columns[n]].apply(pd.to_numeric, errors='coerce').fillna(0).astype(float).dropna()

- 95
- 1
- 9
You can find the data type of a column from the dtype.kind
attribute. Something like df[col].dtype.kind
. See the numpy docs for more details. Transpose the dataframe to go from indices to columns.

- 1,867
- 1
- 16
- 23