How to convert Object to Float in Python

Question

I have the following dataframe :

     Daily_KWH_System      year     month  day  hour  minute  second
 0         4136.900384      2016      9    7     0       0       0
 1         3061.657187      2016      9    8     0       0       0
 2         4099.614033      2016      9    9     0       0       0
 3         3922.490275      2016      9   10     0       0       0
 4         3957.128982      2016      9   11     0       0       0
 5         4177.014316      2016      9   12     0       0       0
 6         3077.103445      2016      9   13     0       0       0
 7         4123.103795      2016      9   14     0       0       0
..                ...       ...      ...  ...   ...     ...     ...
551               NaN       2016      11  23     0       0       0
552               NaN       2016      11  24     0       0       0
553               NaN       2016      11  25     0       0       0
..                ...       ...      ...  ...   ...     ...     ...
579               NaN       2016      11  27     0       0       0
580               NaN       2016      11  28     0       0       0

The variables type is as follows:

print(df.dtypes)

Daily_KWH_System    object
year                 int32
month                int32
day                  int32
hour                 int32
minute               int32
second               int32

I need to convert "Daily_KWH_System" to Float, so that I use in Linear Regression model.

I tried the below code, which worked fine.

 df['Daily_KWH_System'] = pd.to_numeric(df['Daily_KWH_System'], errors='coerce')

Then I replaced the NaN's to Blank space, to use in my model. And I used the following code

 df = df.replace(np.nan,' ', regex=True)

But, again the variable " Daily_KWH_System" is getting converted to Object as soon as i replace NaN'.

Please let me know how to go about it

I do not need NaN in my dataframe, because the model only accepts Int/Float. Hence I need those values to be blank, so I can predict those values — Anagha, Feb 09 '17 at 06:59
Hmm, if check this [question](http://stackoverflow.com/questions/13643363/linear-regression-of-arrays-containing-nans-in-python-numpy) you need remove `NaN` values by `Daily_KWH_System = df.loc[df.Daily_KWH_System.notnull(), 'Daily_KWH_System']`. But maybe need something else... — jezrael, Feb 09 '17 at 07:13
But if check [this](http://stackoverflow.com/a/19380761/2901002) - `NaN` are not problem - theey are removed by `dropna()`. — jezrael, Feb 09 '17 at 07:15
So second posible solution is `Daily_KWH_System = df.Daily_KWH_System.dropna()`. Please check how it works. — jezrael, Feb 09 '17 at 07:17
Or if need remove all rows where is at least one `NaN` - `df = df.dropna()` — jezrael, Feb 09 '17 at 07:17
`object` is being set as the dtype because you tried to force the dtype to float initially using `to_numeric`, but you then replaced the `NaN` with a `str` so the dtype becomes mixed. What are you wanting here? Pandas is doing the correct thing here but is your model going to be able to handle float and strings? — EdChum, Feb 09 '17 at 09:33
@EdChum the model only takes Float/Int. And I need those Blank, Since for the last few rows, it would predict the value based on other variables. — Anagha, Feb 09 '17 at 09:37
Then leave them as `NaN`, it doesn't make sense to make them blank, unless you ignore those and don't pass them to the model — EdChum, Feb 09 '17 at 09:37

How to convert Object to Float in Python

0 Answers0