0

I have a dataset that has numerical values, empty values and text values. I want to do the following in pandas:

  1. Numerical Values -> Float
  2. Empty Values -> N/A
  3. Text Values -> N/A

When I try to run astype('float'), I get an error:

import pandas as pd
data = ['5', '4', '3', '', 'NO DATA ', '5']
data = ['5', '4', '3', '', '', '5']
df = pd.DataFrame({'data': data})

df[['data']].astype('float')

I've tried to look over the documentation and stackoverflow, but I didn't find out how to do this.

Al-Baraa El-Hag
  • 770
  • 6
  • 15

1 Answers1

1

Using panda's to_numeric function, we can turn any valid value into floats, while turning invalid values into NaNs:

import pandas as pd
data = ['5', '4', '3', 'NO DATA', '', '5']
df = pd.DataFrame({'data': data})

df['data'] = pd.to_numeric(df['data'], errors='coerce')

The errors='coerce' makes sure that invalid values are turned into NaN instead of raising an error.

And the result will be:

data
0   5.0
1   4.0
2   3.0
3   NaN
4   NaN
5   5.0
Adid
  • 1,504
  • 3
  • 13
  • 1
    Questions like this have been asked many times, and new ones should be closed as duplicates of the originals. – Nick Mar 12 '23 at 06:17