347

I have a pandas Series object containing boolean values. How can I get a series containing the logical NOT of each value?

For example, consider a series containing:

True
True
True
False

The series I'd like to get would contain:

False
False
False
True

This seems like it should be reasonably simple, but apparently I've misplaced my mojo =(

smci
  • 32,567
  • 20
  • 113
  • 146
Louis Thibault
  • 20,240
  • 25
  • 83
  • 152
  • 4
    It is important that the data does not contain `object` types for the answers below to work, so use: `~ df.astype('bool')` – LearnOPhile Sep 11 '17 at 13:01
  • 1
    I've written about all of the logical operators in [this post](https://stackoverflow.com/a/54358361/4909087). The post also includes alternatives. – cs95 Jan 25 '19 at 03:07

6 Answers6

410

To invert a boolean Series, use ~s:

In [7]: s = pd.Series([True, True, False, True])

In [8]: ~s
Out[8]: 
0    False
1    False
2     True
3    False
dtype: bool

Using Python2.7, NumPy 1.8.0, Pandas 0.13.1:

In [119]: s = pd.Series([True, True, False, True]*10000)

In [10]:  %timeit np.invert(s)
10000 loops, best of 3: 91.8 µs per loop

In [11]: %timeit ~s
10000 loops, best of 3: 73.5 µs per loop

In [12]: %timeit (-s)
10000 loops, best of 3: 73.5 µs per loop

As of Pandas 0.13.0, Series are no longer subclasses of numpy.ndarray; they are now subclasses of pd.NDFrame. This might have something to do with why np.invert(s) is no longer as fast as ~s or -s.

Caveat: timeit results may vary depending on many factors including hardware, compiler, OS, Python, NumPy and Pandas versions.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Duly noted. Other than being much slower, what's the difference between the tilde and `-` ? – Louis Thibault Apr 14 '13 at 12:38
  • 1
    Wierd, I actually tested the `tilde` as it was mentioned in the documentation, but it didn't perform the same as `np.invert` :S – root Apr 14 '13 at 13:11
  • @blz: At least on my Ubuntu machine, running NumPy 1.6.2, the performance of `np.invert(s)`, `~s` and `-s` are all the same. – unutbu Apr 14 '13 at 13:47
  • @root: I'm not sure why there is such a great discrepancy in our timeit results, but it certainly can happen. What OS and version of NumPy are you using? – unutbu Apr 14 '13 at 13:49
  • Also on Ubuntu, but using NumPy 1.7.0...(`np.bitwise_not(s)` performs the same as `np.inverse`). – root Apr 14 '13 at 13:50
  • @root, @unutbu, I can confirm that `np.invert` and the `~` operator have identical performance on my machine as well: numpy 1.6.2 on Ubuntu latest. – Louis Thibault Apr 14 '13 at 13:54
  • Where does the Pandas documentation tell about the `~` operator? – Robert Pollak Oct 28 '16 at 13:35
  • @RobertPollak: It is mentioned [here](http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing). – unutbu Oct 28 '16 at 18:59
  • For completeness sake, consider adding `%timeit s == False` - it's about twice as slow as the slowest contender, but it paints a more complete picture, IMHO :) – Olsgaard Jun 09 '20 at 15:33
70

@unutbu's answer is spot on, just wanted to add a warning that your mask needs to be dtype bool, not 'object'. Ie your mask can't have ever had any nan's. See here - even if your mask is nan-free now, it will remain 'object' type.

The inverse of an 'object' series won't throw an error, instead you'll get a garbage mask of ints that won't work as you expect.

In[1]: df = pd.DataFrame({'A':[True, False, np.nan], 'B':[True, False, True]})
In[2]: df.dropna(inplace=True)
In[3]: df['A']
Out[3]:
0    True
1   False
Name: A, dtype object
In[4]: ~df['A']
Out[4]:
0   -2
0   -1
Name: A, dtype object

After speaking with colleagues about this one I have an explanation: It looks like pandas is reverting to the bitwise operator:

In [1]: ~True
Out[1]: -2

As @geher says, you can convert it to bool with astype before you inverse with ~

~df['A'].astype(bool)
0    False
1     True
Name: A, dtype: bool
(~df['A']).astype(bool)
0    True
1    True
Name: A, dtype: bool
JSharm
  • 1,117
  • 12
  • 11
  • in your example, the output ints mask can be converted to the bool series you want with `.astype(bool)` e.g. `~df['A'].astype(bool)` – geher Feb 06 '20 at 10:35
  • 1
    This is working because ``astype(bool)`` is happening before the ``~`` ```~df['A'].astype(bool)``` vs ```(~df['A']).astype(bool)``` – JSharm Feb 06 '20 at 11:20
21

I just give it a shot:

In [9]: s = Series([True, True, True, False])

In [10]: s
Out[10]: 
0     True
1     True
2     True
3    False

In [11]: -s
Out[11]: 
0    False
1    False
2    False
3     True
herrfz
  • 4,814
  • 4
  • 26
  • 37
7

You can also use numpy.invert:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: s = pd.Series([True, True, False, True])

In [4]: np.invert(s)
Out[4]: 
0    False
1    False
2     True
3    False

EDIT: The difference in performance appears on Ubuntu 12.04, Python 2.7, NumPy 1.7.0 - doesn't seem to exist using NumPy 1.6.2 though:

In [5]: %timeit (-s)
10000 loops, best of 3: 26.8 us per loop

In [6]: %timeit np.invert(s)
100000 loops, best of 3: 7.85 us per loop

In [7]: %timeit ~s
10000 loops, best of 3: 27.3 us per loop
root
  • 76,608
  • 25
  • 108
  • 120
  • it may not be correct on a different platform. Win 7, python 3.6.3 numpy 1.13.3, pandas 0.20.3, (-s) will be the fastest, (~s) is the second, and np.invert(s) is the slowest one – gaozhidf Apr 08 '18 at 01:25
1

In support to the excellent answers here, and for future convenience, there may be a case where you want to flip the truth values in the columns and have other values remain the same (nan values for instance)

In[1]: series = pd.Series([True, np.nan, False, np.nan])
In[2]: series = series[series.notna()] #remove nan values
 
In[3]: series # without nan                                            
Out[3]: 
0     True
2    False
dtype: object

# Out[4] expected to be inverse of Out[3], pandas applies bitwise complement 
# operator instead as in `lambda x : (-1*x)-1`

In[4]: ~series
Out[4]: 
0    -2
2    -1
dtype: object

as a simple non-vectorized solution you can just, 1. check types2. inverse bools

In[1]: series = pd.Series([True, np.nan, False, np.nan])

In[2]: series = series.apply(lambda x : not x if x is bool else x)
Out[2]: 
Out[2]: 
0     True
1      NaN
2    False
3      NaN
dtype: object
0

NumPy is slower because it casts the input to boolean values (so None and 0 becomes False and everything else becomes True).

import pandas as pd
import numpy as np
s = pd.Series([True, None, False, True])
np.logical_not(s)

gives you

0    False
1     True
2     True
3    False
dtype: object

whereas ~s would crash. In most cases tilde would be a safer choice than NumPy.

Pandas 0.25, NumPy 1.17

grofte
  • 1,839
  • 1
  • 16
  • 15