KeyError when using s.loc and s.first_valid_index()

Question

I have data similar to this post: pandas: Filling missing values within a group

That is, I have data in a number of observation sessions, and there is a focal individual for each session. That focal individual is only noted once, but I want to fill in the focal ID data for each line during that session. So, the data look something like this:

     Focal    Session
0    NaN      1
1    50101    1
2    NaN      1
3    NaN      2
4    50408    2
5    NaN      2

Based on the post linked above, I was using this code:

g = data.groupby('Session')

g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])

But this returns a KeyError (specifically, KeyError:None). According to the .loc documentation, KeyErrors can result when the data isn't found. So, I've checked and while I have 152 sessions, I only have 150 non-null data points in the Focal column. Before I decide to manually search my data for which of the sessions is missing a Focal ID, I have two questions:

I am very much a beginner. So is this a reasonable explanation for why I am getting a KeyError?
If it is reasonable, is there a way to figure out which Session is missing Focal ID data, that will save me from manually looking through the data?

Output here:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-330-0e4f27aa7e14> in <module>()
----> 1 data['Focal'] = g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
      2 g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])

//anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in transform(self, func,     *args, **kwargs)
   1540         for name, group in self:
   1541             object.__setattr__(group, 'name', name)
-> 1542             res = wrapper(group)
   1543             # result[group.index] = res
   1544             indexer = self.obj.index.get_indexer(group.index)

//anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in <lambda>(x)
   1536             wrapper = lambda x: getattr(x, func)(*args, **kwargs)
   1537         else:
-> 1538             wrapper = lambda x: func(x, *args, **kwargs)
   1539 
   1540         for name, group in self:

<ipython-input-330-0e4f27aa7e14> in <lambda>(s)
----> 1 data['Focal'] = g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])
      2 g['Focal'].transform(lambda s: s.loc[s.first_valid_index()])

//anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in __getitem__(self, key)
    669             return self._getitem_tuple(key)
    670         else:
--> 671             return self._getitem_axis(key, axis=0)
    672 
    673     def _getitem_axis(self, key, axis=0):

//anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
    756             return self._getitem_iterable(key, axis=axis)
    757         else:
--> 758             return self._get_label(key, axis=axis)
    759 
    760 class _iLocIndexer(_LocationIndexer):

//anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc in _get_label(self, label, axis)
     58             return self.obj._xs(label, axis=axis, copy=False)
     59         except Exception:
---> 60             return self.obj._xs(label, axis=axis, copy=True)
     61 
     62     def _get_loc(self, key, axis=0):

//anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in _xs(self, key, axis, level, copy)
    570 
    571     def _xs(self, key, axis=0, level=None, copy=True):
--> 572         return self.__getitem__(key)
    573 
    574     def _ixs(self, i, axis=0):

//anaconda/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    611     def __getitem__(self, key):
    612         try:
--> 613             return self.index.get_value(self, key)
    614         except InvalidIndexError:
    615             pass

//anaconda/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
    761         """
    762         try:
--> 763             return self._engine.get_value(series, key)
    764         except KeyError, e1:
    765             if len(self) > 0 and self.inferred_type == 'integer':

//anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2565)()

//anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_value (pandas/index.c:2380)()

//anaconda/lib/python2.7/site-packages/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3166)()

KeyError: None

It's not the version of pandas, as each of the commands works with toy data. It's got to be an interaction with my data, probably missing data. — M.A.Kline, Sep 24 '13 at 17:27
You're totally correct (please check out my answer), the thing which works fine is your dummy data! :) — Andy Hayden, Sep 24 '13 at 17:31

score 1 · Accepted Answer · answered Sep 24 '13 at 06:01

1

The problem is that first_valid_index returns None if there are no valid values (some groups in your DataFrame are all NaN):

In [1]: s = pd.Series([np.nan])

In [2]: s.first_valid_index() # None

Now, loc throws an error because there is no index None:

In [3]: s.loc[s.first_valid_index()]
KeyError: None

What do you want your code to do in this particular case? ...
If you wanted it to be NaN, you could backfill and then take the first element:

g['Focal'].transform(lambda s: s.bfill().iloc[0])

answered Sep 24 '13 at 06:01

Andy Hayden

359,921
101
625
535

Yep. I was slow to comment on this one because I was working on it. Since the Focal ID data are crucial, I did not want to preserve the NaNs. I manually checked the data and found a couple of typos that produced the NaNs. After fixing those and reloading the data, the code I pulled from the earlier post works perfectly. So the answer to my q#1 above is "yes, it's your data" and to q#2 "meh, faster to fix it by hand." Thanks for the suggestion, I'll use it for data where NaN actually indicates no data collected. – M.A.Kline Sep 24 '13 at 17:55

score 0 · Answer 2 · answered Nov 21 '17 at 16:06

If you want to fix the problem that some groups contains only Nan you could do the following:

g = data.groupby('Session')
g['Focal'].transform(lambda s: 'No values to aggregate' if pd.isnull(s).all() == True else s.loc[s.first_valid_index()])
df['Focal'] = g['Focal'].transform(lambda s: 'No values to aggregate' if pd.isnull(s).all() == True else s.loc[s.first_valid_index()])

In this way you input 'No Values to aggregate' (or whatever you want) when the program find all Nan for a particular group, instead of blocking the execution to return an error.

Hope this helps :)

Federico

KeyError when using s.loc and s.first_valid_index()

2 Answers2