1

Given a rather large dataframe, I am looking to preprocess the inputs by standardizing them using the sklearn preprocessing module.

However, this error shows up:

ValueError: could not convert string to float:

How do I go about removing ANY row containing a non-float/integer type of value from my pandas DataFrame?

Here's the type of dataframe I have.

In [1]: df = pd.DataFrame([[0.02,  0.32], [1 04,  2 64], [2 06,  4 96]], columns=['A', 'B'])

Out[2]: 
   A  B
0  0.02  0.32
1  1 04  2 64
2  2 06  4 96

Here's what I want to achieve:

In [1]: df = pd.DataFrame([[1, 2], [1, a], [4, 6]], columns=['A', 'B'])

#eliminate the space as a decimal separator and use a dot.

Out[2]: 
   A  B
0  0.02  0.32
1  1.04  2.64
2  2.06  4.96
desertnaut
  • 57,590
  • 26
  • 140
  • 166
ThePhaenom
  • 23
  • 4

2 Answers2

0

i use this often to make numerical dataset from the data

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
display(df.select_dtypes(include=numerics).columns)
df_numerics = df.select_dtypes(include=numerics)
df_numerics.head()
Prima
  • 70
  • 5
  • That worked perfectly, thank you. I used it on a pandas DataFrame. Afterwards, I converted it back to a numpy array. – ThePhaenom Mar 16 '21 at 15:05
  • Afterall it did not work perfectly as it also eliminated entire columns. I only wanted it to eliminate the rows in which the invalid non-numeric values showed up – ThePhaenom Mar 16 '21 at 15:23
0

You can use stack with str.replace

new_df = df.stack().astype(str).str.replace(' ', '.').astype(float).unstack()

      A     B
0  0.02  0.32
1  1.04  2.64
2  2.06  4.96

print(new_df.dtypes)

A    float64
B    float64
dtype: object
It_is_Chris
  • 13,504
  • 2
  • 23
  • 41