2

I would like to filter the results of a pandas groupBy directly, without having to store the groupBy result in a variable first. For example:

df = pd.DataFrame([("a", 1)]*3+[("b", 1)]*2+[("c", 1)], columns=["title", "counts"])

res = df.groupby("title").agg({"counts":"sum"}) # I want to skip creating res

my_res = res.loc[res.counts >2] 

In the above example, I would like to create my_res with an one-liner. In Spark/Scala this can be achieved simply by chaining a filter operation, but in pandas filter has a different purpose.

cs95
  • 379,657
  • 97
  • 704
  • 746
geompalik
  • 1,582
  • 11
  • 22
  • 2
    `df.groupby("title").agg({"counts":"sum"}).query('counts > 2')` – cs95 Feb 06 '19 at 10:51
  • I would also recommend taking a look at [this](https://stackoverflow.com/q/53779986/4909087) post of mine. – cs95 Feb 06 '19 at 10:52
  • Thank you, it does the work. If you want to post an answer, I will accept it; I will check the recommended post also – geompalik Feb 06 '19 at 10:55

1 Answers1

2

Use query to chain this step:

df.groupby("title").agg({"counts":"sum"}).query('counts > 2')

       counts
title        
a           3
cs95
  • 379,657
  • 97
  • 704
  • 746