0

I want to convert the dataframe having null values into my test set so i can train the data with no null values and predict the null values using a regression model.

for i in df1:
    if (df1['dependents'].iloc[i].notnull())==False:
        test[i]=df1[i]

so far i tried this code but this showing an error.

TypeError                                 Traceback (most recent call last)
<ipython-input-13-975c8029ee0e> in <module>
      1 for i in df1:
----> 2     if (df1['dependents'].iloc[i].notnull())==False:
      3         test[i]=df1[i]

~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   1765 
   1766             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1767             return self._getitem_axis(maybe_callable, axis=axis)
   1768 
   1769     def _is_scalar_access(self, key: Tuple):

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   2132             key = item_from_zerodim(key)
   2133             if not is_integer(key):
-> 2134                 raise TypeError("Cannot index by location index with a non-integer key")
   2135 
   2136             # validate the location

TypeError: Cannot index by location index with a non-integer key
niraj
  • 17,498
  • 4
  • 33
  • 48
  • What is in `df1`? From the error, it seems that `i` is returning non-integer values but `.iloc[]` is expecting integer value – rain01 Jun 14 '20 at 02:08
  • 1
    Welcome to SO! Can you please be more specific about what you're trying to accomplish? Are you trying to replace null values in `df1` with corresponding values in `test`? Sample data would help. – Mike Tomaino Jun 14 '20 at 02:29
  • it has 3 columns age, gender, dependents. Age is of type integer, gender of type object and dependant of type float. Only dependant contains the missing values. – Mridul Arora Jun 14 '20 at 03:07

2 Answers2

1

Following Code will allow u split Null values into different Data-frame:

test = df1[df1['dependents'].isnull()]
0

for i in df1 will iterate over the column names rather than the rows. To iterate over the rows, you need to use iterrows() or iteritems(), as explained in this answer:

import pandas as pd
from numpy import nan

# example data
df1 = pd.DataFrame(
    {'age':        [ 30,  16,  40,  40,  30],
     'gender':     ['M', 'F', 'X', 'M', 'F'],
     'dependents': [  2,   0,   2, nan,   3]})

# will hold the non-null rows
train = []
# will hold the null rows
test = []

# use iterrows to loop over rows in the dataframe
for i, row in df1.iterrows():
    if pd.isnull(df1['dependents'].iloc[i]):
        test.append(row)
    else:
        train.append(row)

# build dataframe from rows
train_df = pd.DataFrame(train)
test_df  = pd.DataFrame(test)

However, it's usually not necessary to iterate over rows like this at all. There's a much more efficient way:

train_df = df1[~pd.isnull(df1['dependents'])]
test_df  = df1[pd.isnull(df1['dependents'])]
anjsimmo
  • 704
  • 5
  • 18