4

I have written the following function. When calling it, it throws KeyError for dataset.loc[] call. I would like to understand why this is happening and how to avoid the same.

def ChangeColumnValues(dataset, columnValues):
    """Changes the values of given columns into the given key value pairs

    :: Argument Description ::
    dataset - Dataset for which the values are to be updated
    columnValues - Dictionary with Column and Value-Replacement pair
    """

    for column, valuePair in columnValues.items():
        for value, replacement in valuePair.items():
            dataset.loc[str(dataset[column]) == value, column] = replacement

    return dataset

BankDS = da.ChangeColumnValues(BankDS, {
    'Default': {
        'no': -1,
        'yes': 1
    },
    'Housing': {
        'no': -1,
        'yes': 1
    },
    'Loan': {
        'no': -1,
        'yes': 1
    },
    'Y': {
        'no': 0,
        'yes': 1
    }
})

Error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-20-0c766179be88> in <module>()
     30     WineQualityDS = da.MeanNormalize(WineQualityDS)
     31 
---> 32 PreProcessDataSets()

<ipython-input-20-0c766179be88> in PreProcessDataSets()
     20         'Y': {
     21             'no': 0,
---> 22             'yes': 1
     23         }
     24     })

W:\MyProjects\Python\ML\FirstOne\DAHelper\DataSet.py in ChangeColumnValues(dataset, columnValues)
     73     for column, valuePair in columnValues.items():
     74         for value, replacement in valuePair.items():
---> 75             dataset.loc[str(dataset[column]) == value, column] = replacement
     76 
     77     return dataset

C:\Program Files\Anaconda3\lib\site-packages\pandas\core\indexing.py in __setitem__(self, key, value)
    177             key = com._apply_if_callable(key, self.obj)
    178         indexer = self._get_setitem_indexer(key)
--> 179         self._setitem_with_indexer(indexer, value)
    180 
    181     def _has_valid_type(self, k, axis):

C:\Program Files\Anaconda3\lib\site-packages\pandas\core\indexing.py in _setitem_with_indexer(self, indexer, value)
    310                     # reindex the axis to the new value
    311                     # and set inplace
--> 312                     key, _ = convert_missing_indexer(idx)
    313 
    314                     # if this is the items axes, then take the main missing

C:\Program Files\Anaconda3\lib\site-packages\pandas\core\indexing.py in convert_missing_indexer(indexer)
   1963 
   1964         if isinstance(indexer, bool):
-> 1965             raise KeyError("cannot use a single bool to index into setitem")
   1966         return indexer, True
   1967 

KeyError: 'cannot use a single bool to index into setitem'

Also please let me know if there any better/right way to implement what I am trying achieve with ChangeColumnValues function

Benison Sam
  • 2,755
  • 7
  • 30
  • 40
  • 1
    I'm just glancing at this and not 100% sure what you are doing, but there is a pandas method, `replace()` that takes a dictionary, say, "dct" as an argument. So you probably could do the above as something like `df.replace(dct)`. E.g. see here: https://stackoverflow.com/questions/20250771/remap-values-in-pandas-column-with-a-dict/41678874#41678874 – JohnE Dec 02 '17 at 20:48
  • Thanks for the suggestion @JohnE! I went through the doc and figured out how I can use `replace()` for my requirement. – Benison Sam Dec 02 '17 at 21:18
  • sure, no problem! – JohnE Dec 02 '17 at 21:50

1 Answers1

4

I got the answer after a few digging (google searches) and brain storming into the issue. Following is the corrected function:

def ChangeColumnValues(dataset, columnValues):
    """Changes the values of given columns into the given key value pairs

    :: Argument Description ::
    dataset - Dataset for which the values are to be updated
    columnValues - Dictionary with Column and Value-Replacement pair
    """

    for column, valuePair in columnValues.items():
        for value, replacement in valuePair.items():
            dataset.loc[dataset[column] == value, column] = replacement

    return dataset

Note that I have removed the str() from the comparison which was causing the key for dataset.loc as a scalar boolean value rather than a series value, which is needed here in order to point to the resultant condition for each value in the target series. So by removing the str() it resulted to be a boolean series which is what we need for the whole thing to work.

I am new to python, if my understanding is wrong here, please correct me!

Edit:

As suggested by @JohnE, the functionality which I was trying to achieve can also be done easily using pandas replace() method. I am putting in a corresponding implementation as it can be of help to someone:

BankDS = BankDS.replace({
        'Default': {
            'no': -1,
            'yes': 1
        },
        'Housing': {
            'no': -1,
            'yes': 1
        },
        'Loan': {
            'no': -1,
            'yes': 1
        },
        'Y': {
            'no': 0,
            'yes': 1
        }
    })
Benison Sam
  • 2,755
  • 7
  • 30
  • 40
  • In my case I had to use `(df.myCol.astype(float) > 60)` instead of `(float(df.myCol) > 60)` – LoMaPh Apr 08 '19 at 23:28