21

I'm trying to set a number of different in a pandas DataFrame all to the same value. I thought I understood boolean indexing for pandas, but I haven't found any resources on this specific error.

import pandas as pd 
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
mask = df.isin([1, 3, 12, 'a'])
df[mask] = 30
Traceback (most recent call last):
...
TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

Above, I want to replace all of the True entries in the mask with the value 30.

I could do df.replace instead, but masking feels a bit more efficient and intuitive here. Can someone explain the error, and provide an efficient way to set all of the values?

Michael K
  • 2,196
  • 6
  • 34
  • 52
  • 1
    Could you confirm whether my answer or JohnE's result is what you desired and update your question to clearly indicate this, thanks – EdChum May 29 '15 at 12:33
  • Thanks, @EdChum. I wanted to set the `True` values in the mask and leave the `False` values the same. I edited the question for clarity. – Michael K May 29 '15 at 17:31
  • No worries, I updated my answer, you just invert the mask to achieve what you want – EdChum May 29 '15 at 17:35
  • Hey, your call but I didn't mean for you to switch the checkmark, just trying to clarify desired results. I think @EdChums's answer is bit cleaner and more concise if you want to switch it back to his. – JohnE May 29 '15 at 17:43
  • Yeah, you're right. They're both good answers. Sorry all! – Michael K May 29 '15 at 17:44

4 Answers4

27

You can't use the boolean mask on mixed dtypes for this unfortunately, you can use pandas where to set the values:

In [59]:
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
mask = df.isin([1, 3, 12, 'a'])
df = df.where(mask, other=30)
df

Out[59]:
    A   B
0   1   a
1  30  30
2   3  30

Note: that the above will fail if you do inplace=True in the where method, so df.where(mask, other=30, inplace=True) will raise:

TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

EDIT

OK, after a little misunderstanding you can still use where y just inverting the mask:

In [2]:    
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
mask = df.isin([1, 3, 12, 'a'])
df.where(~mask, other=30)

Out[2]:
    A   B
0  30  30
1   2   b
2  30   f
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • Are the results right? Should 30 be filled in for True or for False values? It's the opposite of what I thought was asked for (although I could have it backwards) and it could be easily reversed by taking the complement of the mask. – JohnE May 29 '15 at 12:22
  • @JohnE when you use `where`, the mask will produce the original values where the mask is `True`, the `other` value will used for where the mask is `False`, so the default for `other` is `NaN`, so it looks confusing but this is the expected and desired result – EdChum May 29 '15 at 12:26
  • I agree that's how your code works and is supposed to, I'm just noting that it seems to be the opposite of what was asked for. If you run the OP's code on just column A, for example, it produces `30 2 30`. – JohnE May 29 '15 at 12:29
  • @JohnE yes this does seem ambiguous, I read his question as a way to replace `NaN` values but it reads more like your answer but the OP accepted my answer so it's unclear to me whether he wanted this result but asked for yours – EdChum May 29 '15 at 12:33
  • Yep, no worries. I just noticed we had exactly opposite results. I think your approach is cleaner and upvoted it. – JohnE May 29 '15 at 12:36
  • Is the double negation really needed? I want to set "everything at a mask" to a value. I should mask, and then set everything not-not at the mask to the value? Makes no sense to me. – Gulzar Apr 27 '21 at 15:09
4

If you want to use different columns to create your mask, you need to call the values property of the dataframe.


Example

Let's say we want to, replace values in A_1 and 'A_2' according to a mask in B_1 and B_2. For example, replace those values in A (to 999) that corresponds to nulls in B.

The original dataframe:

   A_1  A_2  B_1  B_2
0    1    4    y    n
1    2    5    n  NaN
2    3    6  NaN  NaN

The desired dataframe

   A_1  A_2  B_1  B_2
0    1    4    y    n
1    2  999    n  NaN
2  999  999  NaN  NaN

The code:

df = pd.DataFrame({
     'A_1': [1, 2, 3], 
     'A_2': [4, 5, 6], 
     'B_1': ['y', 'n', np.nan], 
     'B_2': ['n', np.nan, np.nan]})

_mask = df[['B_1', 'B_2']].notnull().values
df[['A_1', 'A_2']] = df[['A_1','A_2']].where(_mask, other=999)



   A_1  A_2
0    1    4
1    2  999
2  999  999
Community
  • 1
  • 1
toto_tico
  • 17,977
  • 9
  • 97
  • 116
  • Thank you for this answer. The .values property was exactly what I needed in order to make my mask work. Very much appreciated! – Tim Mayes Sep 24 '19 at 02:05
3

I'm not 100% sure but I suspect the error message relates to the fact that there is not identical treatment of missing data across different dtypes. Only float has NaN, but integers can be automatically converted to floats so it's not a problem there. But it appears mixing number dtypes and object dtypes does not work so easily...

Regardless of that, you could get around it pretty easily with np.where:

df[:] = np.where( mask, 30, df ) 

    A   B
0  30  30
1   2   b
2  30   f
JohnE
  • 29,156
  • 8
  • 79
  • 109
0

pandas uses NaN to mark invalid or missing data and can be used across types, since your DataFrame as mixed int and string data types it will not accept the assignment to a single type (other than NaN) as this would create a mixed type (int and str) in B through an in-place assignment.

@JohnE method using np.where creates a new DataFrame in which the type of column B is an object not a string as in the initial example.

Paul Joireman
  • 2,689
  • 5
  • 25
  • 33