Pandas How to filter a Series

Question

I have a Series like this after doing groupby('name') and used mean() function on other column

name
383      3.000000
663      1.000000
726      1.000000
737      9.000000
833      8.166667

Could anyone please show me how to filter out the rows with 1.000000 mean values? Thank you and I greatly appreciate your help.

Well, how would you filter a series upon a given condition? – Feb 02 '15 at 06:23 — , Feb 02 '15 at 06:23

score 194 · Accepted Answer · edited Jan 04 '16 at 21:57

194

In [5]:

import pandas as pd

test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}

s = pd.Series(test)
s = s[s != 1]
s
Out[0]:
383    3.000000
737    9.000000
833    8.166667
dtype: float64

edited Jan 04 '16 at 21:57

Kamil Sindi

21,782
19
96
120

answered Feb 02 '15 at 06:31

Andrew

3,711
2
20
17

13

I prefer the answers below because they can be chained (i.e. no need to define `s` and then use it twice in the expression). Only works from pandas 0.18 though. – IanS May 09 '17 at 07:56
Also see timing comparisons in piRSquared's [answer](http://stackoverflow.com/a/43401268/5276797). – IanS May 09 '17 at 07:58

score 100 · Answer 2 · edited Dec 20 '19 at 18:28

100

From pandas version 0.18+ filtering a series can also be done as below

test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}

pd.Series(test).where(lambda x : x!=1).dropna()

Checkout: http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements

edited Dec 20 '19 at 18:28

Leo Ufimtsev

6,240
5
40
48

answered May 19 '16 at 15:17

DACW

2,601
2
18
16

3

So much nicer with method chaining (and reminds me of Spark.) – Dylan Hogg Aug 08 '17 at 09:14
2

True but Spark does something more intuitive in this case: it simply gets rid of rows that don't match the predicate, that means not using the ".dropna()" part which seemed clearly superfluous to me until I read the doc. Got bitten by that :D – Florent Moiny Mar 17 '20 at 13:53

score 77 · Answer 3 · edited Jan 07 '18 at 21:34

77

As DACW pointed out, there are method-chaining improvements in pandas 0.18.1 that do what you are looking for very nicely.

Rather than using .where, you can pass your function to either the .loc indexer or the Series indexer [] and avoid the call to .dropna:

test = pd.Series({
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
})

test.loc[lambda x : x!=1]

test[lambda x: x!=1]

Similar behavior is supported on the DataFrame and NDFrame classes.

edited Jan 07 '18 at 21:34

Nate Anderson

18,334
18
100
135

answered Jul 08 '16 at 03:31

Gordon Bean

4,272
1
32
47

3

This is my favorite answer, and it also seems to be the fastest without going down to numpy (see timing comparisons). – IanS May 09 '17 at 08:04

score 30 · Answer 4 · answered Apr 13 '17 at 20:36

30

A fast way of doing this is to reconstruct using numpy to slice the underlying arrays. See timings below.

mask = s.values != 1
pd.Series(s.values[mask], s.index[mask])

0
383    3.000000
737    9.000000
833    8.166667
dtype: float64

naive timing

answered Apr 13 '17 at 20:36

piRSquared

285,575
57
475
624

，I like your method, I wanna know what if I have multi-masks. Thx – Menglong Li Jul 07 '17 at 10:14
1

@MenglongLi depends, you should ask a question. Most likely, you'd combine them with &. mask = mask1 & mask2 – piRSquared Jul 07 '17 at 14:19

score 7 · Answer 5 · answered Jan 04 '16 at 21:02

7

Another way is to first convert to a DataFrame and use the query method (assuming you have numexpr installed):

import pandas as pd

test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}

s = pd.Series(test)
s.to_frame(name='x').query("x != 1")

answered Jan 04 '16 at 21:02

Kamil Sindi

21,782
19
96
120

I don't think that it's a good idea to pass a condition as a string – SzymonPajzert Aug 17 '17 at 09:53
2

This adds all the overhead of a dataframe, and is going to be very slow. – fantabolous Nov 19 '18 at 13:11

score 4 · Answer 6 · answered Apr 13 '17 at 20:21

4

If you like a chained operation, you can also use compress function:

test = pd.Series({
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
})

test.compress(lambda x: x != 1)

# 383    3.000000
# 737    9.000000
# 833    8.166667
# dtype: float64

answered Apr 13 '17 at 20:21

Psidom

209,562
33
339
356

2

Please note `pandas.Series.compress` is deprecated since version 0.24.0. of `pandas`. – ipj Jan 18 '21 at 17:56

score 2 · Answer 7 · edited May 01 '22 at 15:49

In my case I had a pandas Series where the values are tuples of characters:

Out[67]
0    (H, H, H, H)
1    (H, H, H, T)
2    (H, H, T, H)
3    (H, H, T, T)
4    (H, T, H, H)

Therefore I could use indexing to filter the series, but to create the index I needed apply. My condition is "find all tuples which have exactly one 'H'".

series_of_tuples[series_of_tuples.apply(lambda x: x.count('H')==1)]

I admit it is not "chainable", (i.e. notice I repeat series_of_tuples twice; you must store any temporary series into a variable so you can call apply(...) on it).

There may also be other methods (besides .apply(...)) which can operate elementwise to produce a Boolean index.

Many other answers (including accepted answer) using the chainable functions like:

.compress()
.where()
.loc[]
[]

These accept callables (lambdas) which are applied to the Series, not to the individual values in those series!

Therefore my Series of tuples behaved strangely when I tried to use my above condition / callable / lambda, with any of the chainable functions, like .loc[]:

series_of_tuples.loc[lambda x: x.count('H')==1]

Produces the error:

KeyError: 'Level H must be same as name (None)'

I was very confused, but it seems to be using the Series.count series_of_tuples.count(...) function , which is not what I wanted.

I admit that an alternative data structure may be better:

A Category datatype?
A Dataframe (each element of the tuple becomes a column)
A Series of strings (just concatenate the tuples together):

This creates a series of strings (i.e. by concatenating the tuple; joining the characters in the tuple on a single string)

series_of_tuples.apply(''.join)

So I can then use the chainable Series.str.count

series_of_tuples.apply(''.join).str.count('H')==1

Pandas How to filter a Series

7 Answers7

Linked

Related