0

I have the following dataframe :

     Daily_KWH_System      year     month  day  hour  minute  second
 0         4136.900384      2016      9    7     0       0       0
 1         3061.657187      2016      9    8     0       0       0
 2         4099.614033      2016      9    9     0       0       0
 3         3922.490275      2016      9   10     0       0       0
 4         3957.128982      2016      9   11     0       0       0
 5         4177.014316      2016      9   12     0       0       0
 6         3077.103445      2016      9   13     0       0       0
 7         4123.103795      2016      9   14     0       0       0
..                ...       ...      ...  ...   ...     ...     ...
551               NaN       2016      11  23     0       0       0
552               NaN       2016      11  24     0       0       0
553               NaN       2016      11  25     0       0       0
..                ...       ...      ...  ...   ...     ...     ...
579               NaN       2016      11  27     0       0       0
580               NaN       2016      11  28     0       0       0

The variables type is as follows:

print(df.dtypes)

Daily_KWH_System    object
year                 int32
month                int32
day                  int32
hour                 int32
minute               int32
second               int32

I need to convert "Daily_KWH_System" to Float, so that I use in Linear Regression model.

I tried the below code, which worked fine.

 df['Daily_KWH_System'] = pd.to_numeric(df['Daily_KWH_System'], errors='coerce')

Then I replaced the NaN's to Blank space, to use in my model. And I used the following code

 df = df.replace(np.nan,' ', regex=True)

But, again the variable " Daily_KWH_System" is getting converted to Object as soon as i replace NaN'.

Please let me know how to go about it

Anagha
  • 3,073
  • 8
  • 25
  • 43
  • I do not need NaN in my dataframe, because the model only accepts Int/Float. Hence I need those values to be blank, so I can predict those values – Anagha Feb 09 '17 at 06:59
  • Hmm, if check this [question](http://stackoverflow.com/questions/13643363/linear-regression-of-arrays-containing-nans-in-python-numpy) you need remove `NaN` values by `Daily_KWH_System = df.loc[df.Daily_KWH_System.notnull(), 'Daily_KWH_System']`. But maybe need something else... – jezrael Feb 09 '17 at 07:13
  • But if check [this](http://stackoverflow.com/a/19380761/2901002) - `NaN` are not problem - theey are removed by `dropna()`. – jezrael Feb 09 '17 at 07:15
  • So second posible solution is `Daily_KWH_System = df.Daily_KWH_System.dropna()`. Please check how it works. – jezrael Feb 09 '17 at 07:17
  • Or if need remove all rows where is at least one `NaN` - `df = df.dropna()` – jezrael Feb 09 '17 at 07:17
  • `object` is being set as the dtype because you tried to force the dtype to float initially using `to_numeric`, but you then replaced the `NaN` with a `str` so the dtype becomes mixed. What are you wanting here? Pandas is doing the correct thing here but is your model going to be able to handle float and strings? – EdChum Feb 09 '17 at 09:33
  • @EdChum the model only takes Float/Int. And I need those Blank, Since for the last few rows, it would predict the value based on other variables. – Anagha Feb 09 '17 at 09:37
  • @jezrael I need the whole rows, and cannot be deleted – Anagha Feb 09 '17 at 09:37
  • Then leave them as `NaN`, it doesn't make sense to make them blank, unless you ignore those and don't pass them to the model – EdChum Feb 09 '17 at 09:37

0 Answers0