3

I have a dataframe with some columns:

>>> np.random.seed(0xFEE7)
>>> df = pd.DataFrame({'A': np.random.randint(10, size=10), 
                       'B': np.random.randint(10, size=10),
                       'C': np.random.choice(['A', 'B'], size=10)})
>>> df
   A  B  C
0  0  0  B
1  4  0  B
2  6  6  A
3  8  3  B
4  0  2  A
5  8  4  A
6  4  1  B
7  8  7  A
8  4  4  A
9  1  1  A

I also have a boolean series that matches part of the index of df:

>>> g = df.groupby('C').get_group('A')
>>> ser = g['B'] > 5
>>> ser
2     True
4    False
5    False
7     True
8    False
9    False
Name: B, dtype: bool

I'd like to be able to use ser to set or extract data from df. For example:

>>> df.loc[ser, 'A'] -= 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\jfoxrabinovitz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1762, in __getitem__
    return self._getitem_tuple(key)
  File "C:\Users\jfoxrabinovitz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1289, in _getitem_tuple
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File "C:\Users\jfoxrabinovitz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1914, in _getitem_axis
    return self._getbool_axis(key, axis=axis)
  File "C:\Users\jfoxrabinovitz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1782, in _getbool_axis
    key = check_bool_indexer(labels, key)
  File "C:\Users\jfoxrabinovitz\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 2317, in check_bool_indexer
    raise IndexingError(
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

The error makes sense since ser is not the same length as df. How do I tell the dataframe to update the rows that match the index of ser and are set to True?

Specifically, I am looking to modify entries at indices 2 and 7 only:

>>> df   # after modification
   A  B  C
0  0  0  B
1  4  0  B
2  3  6  A
3  8  3  B
4  0  2  A
5  8  4  A
6  4  1  B
7  5  7  A
8  4  4  A
9  1  1  A
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • you may need `df.loc[ser[ser].index,'A']` to assign the values not the entire `ser.index` since other values are false, unsure of the expected - hence commenting :-) – anky Apr 21 '21 at 17:13
  • 1
    @anky. Thanks for pointing that out. I've updated the question, and am looking forward to your answer, since your comment is exactly what I want. – Mad Physicist Apr 21 '21 at 17:28
  • It's difficult for me to imagine a scenario where the above Series couldn't be created based on the entire DataFrame in the first place to get a like-indexed Boolean Series. I.e. the above is equivalent to the mask `df.C.eq('A') & df.B.gt(5)`. Further, if you have a duplicated index using `ser` can be problematic. – ALollz Apr 21 '21 at 17:36
  • @ALollz. Please write that as an answer. I like nothing more than to be XY'd – Mad Physicist Apr 21 '21 at 17:38

2 Answers2

4

Since the index of ser doesnot match with the original dataframe, you get that error.

You can solve it 2 ways:

either use series.reindex with a fill_value of False (boolean) and then use loc so the indexes are aligned.

df.loc[ser.reindex(df.index,fill_value=False),'A'] = ... #setvalue

Or you can boolean index the ser series so it returns only the True values and gran the index which you can use with loc:

df.loc[ser[ser].index,'A'] = ... #setvalue
anky
  • 74,114
  • 11
  • 41
  • 70
2

I guess you could just add index to ser inside the loc since both come from a common index.

df.loc[ser.index, 'A'] -= 3

As commented by @Shubham Sharma, the OP required to filter only the True values. This approach get all indexes wih 'A'.

@anky provided a way for that as:

df.loc[ser[ser].index, 'A'] -= 3
viniciusrf1992
  • 313
  • 1
  • 7