Is there a query method or similar for pandas Series (pandas.Series.query())?

Question

The pandas.DataFrame.query() method is of great usage for (pre/post)-filtering data when loading or plotting. It comes particularly handy for method chaining.

I find myself often wanting to apply the same logic to a pandas.Series, e.g. after having done a method such as df.value_counts which returns a pandas.Series.

Example

Lets assume there is a huge table with the columns Player, Game, Points and I want to plot a histogram of the players with more than 14 times 3 points. I first have to sum the points of each player (groupby -> agg) which will return a Series of ~1000 players and their overall points. Applying the .query logic it would look something like this:

df = pd.DataFrame({
    'Points': [random.choice([1,3]) for x in range(100)], 
    'Player': [random.choice(["A","B","C"]) for x in range(100)]})

(df
     .query("Points == 3")
     .Player.values_count()
     .query("> 14")
     .hist())

The only solutions I find force me to do an unnecessary assignment and break the method chaining:

(points_series = df
     .query("Points == 3")
     .groupby("Player").size()
points_series[points_series > 100].hist()

Method chaining as well as the query method help to keep the code legible meanwhile the subsetting-filtering can get messy quite quickly.

# just to make my point :)
series_bestplayers_under_100[series_prefiltered_under_100 > 0].shape

Please help me out of my dilemma! Thanks

I'm not sure that SO is the best place for this question as it's a question for the developers of that library, it's better to post on [github](https://github.com/pandas-dev/pandas/issues) IMO — EdChum, Oct 21 '16 at 08:17
Here is an [issue](https://github.com/pandas-dev/pandas/issues/22347) in the pandas github discussing this. — Erik, May 06 '22 at 11:30

score 19 · Accepted Answer · edited Oct 29 '21 at 14:39

If I understand correctly you can add query("Points > 100"):

df = pd.DataFrame({'Points':[50,20,38,90,0, np.Inf],
                   'Player':['a','a','a','s','s','s']})

print (df)
  Player     Points
0      a  50.000000
1      a  20.000000
2      a  38.000000
3      s  90.000000
4      s   0.000000
5      s        inf

points_series = df.query("Points < inf").groupby("Player").agg({"Points": "sum"})['Points']
print (points_series)     
a = points_series[points_series > 100]
print (a)     
Player
a    108.0
Name: Points, dtype: float64


points_series = df.query("Points < inf")
                  .groupby("Player")
                  .agg({"Points": "sum"})
                  .query("Points > 100")

print (points_series)     
        Points
Player        
a        108.0

Another solution is Selection By Callable:

points_series = df.query("Points < inf")
                  .groupby("Player")
                  .agg({"Points": "sum"})['Points']
                  .loc[lambda x: x > 100]

print (points_series)     
Player
a    108.0
Name: Points, dtype: float64

Edited answer by edited question:

np.random.seed(1234)
df = pd.DataFrame({
    'Points': [np.random.choice([1,3]) for x in range(100)], 
    'Player': [np.random.choice(["A","B","C"]) for x in range(100)]})

print (df.query("Points == 3").Player.value_counts().loc[lambda x: x > 15])
C    19
B    16
Name: Player, dtype: int64

print (df.query("Points == 3").groupby("Player").size().loc[lambda x: x > 15])
Player
B    16
C    19
dtype: int64

Oh wow, bad example from my part! The groupby actually returns a `DataFrame`. But e.g. a `pd.DataFrame.value_counts()` returns a series where your solution comes in handy! Selection By Callable, I was not aware of that - thank you — dmeu, Oct 21 '16 at 08:52
Ok, I add answer for yur new problem - callable are rare used, because it is new functionality. — jezrael, Oct 21 '16 at 09:15

score 11 · Answer 2 · answered Dec 05 '18 at 09:35

11

Instead of query you can use pipe:

s.pipe(lambda x: x[x>0]).pipe(lambda x: x[x<10])

answered Dec 05 '18 at 09:35

Ilya Prokin

684
6
11

1

This only gives you the boolean mask. It does not query/filter the series. – Michele Piccolini Mar 12 '21 at 18:00
1

@MichelePiccolini, are you sure? It does filter the series for me. – iruvar Feb 23 '22 at 18:02
1

Yes, it does filter, it doesn't return the Boolean mask. Be careful, the lambda expression is `x[x>0]`, not just `x>0`. – Denziloe Apr 21 '22 at 00:37
This .pipe() method is a big eye opener to me. – mgouin Jun 13 '23 at 15:54

score 10 · Answer 3 · answered Nov 15 '16 at 12:44

Why not convert from Series to DataFrame, do the querying, and then convert back.

df["Points"] = df["Points"].to_frame().query('Points > 100')["Points"]

Here, .to_frame() converts to DataFrame, while the trailing ["Points"] converts to Series.

The method .query() can then be used consistently whether or not the Pandas object has 1 or more columns.

Is there a query method or similar for pandas Series (pandas.Series.query())?

Example

3 Answers3

Linked