How to remove a rows in a dataframe that contain non-float/integer variables

Question

Given a rather large dataframe, I am looking to preprocess the inputs by standardizing them using the sklearn preprocessing module.

However, this error shows up:

ValueError: could not convert string to float:

How do I go about removing ANY row containing a non-float/integer type of value from my pandas DataFrame?

Here's the type of dataframe I have.

In [1]: df = pd.DataFrame([[0.02,  0.32], [1 04,  2 64], [2 06,  4 96]], columns=['A', 'B'])

Out[2]: 
   A  B
0  0.02  0.32
1  1 04  2 64
2  2 06  4 96

Here's what I want to achieve:

In [1]: df = pd.DataFrame([[1, 2], [1, a], [4, 6]], columns=['A', 'B'])

#eliminate the space as a decimal separator and use a dot.

Out[2]: 
   A  B
0  0.02  0.32
1  1.04  2.64
2  2.06  4.96

pd.to_numeric(unscaled_inputs_all, errors='coerce') gives off an error "TypeError: arg must be a list, tuple, 1-d array, or Series" — ThePhaenom, Mar 16 '21 at 14:54
Is your numpy array not one dimensional? Can we see some sample data? — It_is_Chris, Mar 16 '21 at 14:55
Do you have a numpy array or a pandas dataframe? Either way, please provide some sample data and the full traceback of the error. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — It_is_Chris, Mar 16 '21 at 15:03
I have updated the question, thank you for the source on how to improve question-making — ThePhaenom, Mar 16 '21 at 15:29

score 0 · Answer 1 · answered Mar 16 '21 at 14:54

0

i use this often to make numerical dataset from the data

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
display(df.select_dtypes(include=numerics).columns)
df_numerics = df.select_dtypes(include=numerics)
df_numerics.head()

answered Mar 16 '21 at 14:54

Prima

70
5

That worked perfectly, thank you. I used it on a pandas DataFrame. Afterwards, I converted it back to a numpy array. – ThePhaenom Mar 16 '21 at 15:05
Afterall it did not work perfectly as it also eliminated entire columns. I only wanted it to eliminate the rows in which the invalid non-numeric values showed up – ThePhaenom Mar 16 '21 at 15:23

It_is_Chris · Accepted Answer · 2021-03-16T15:54:04.923

0

You can use stack with str.replace

new_df = df.stack().astype(str).str.replace(' ', '.').astype(float).unstack()

      A     B
0  0.02  0.32
1  1.04  2.64
2  2.06  4.96

print(new_df.dtypes)

A    float64
B    float64
dtype: object

edited Mar 16 '21 at 15:54

answered Mar 16 '21 at 15:38

It_is_Chris

13,504
2
23
41

How to remove a rows in a dataframe that contain non-float/integer variables

2 Answers2