1

I have 3 coulms, isTrade (indicating if a quote or a trade by 0/1), daetTime (time of the event), and tradeBid (how much volume traded. I have values and 0 which I can not make null as I am counting and aggregating it).

I am trying to groupby only the trades (['isTrade'] == 1) and to display the sum (np.sum)but i also want to count the amount of them.

Is there any way to use lambda with countif (lambda x: x.isnull().count() returns count of the 0 values as well)?

I tried to use:

groups = df[df['isTrade'] == 1].groupby('dateTime')                         
grouped = (groups.agg({'tradeBid': [np.sum,lambda x: (x=='tradeBid').count()],})) 

but I get an error:

raise TypeError("invalid type comparison")

I tried with 'count' / 'nunique' but it counts the 0 as well and not what is <>0 All the answers I found or here refer to the parent condition as I already do here with isTrade==1

Edited: The expected results will be per dateTime, sum of column tradeBi, count of values > 1

Added sample file of data:

SecurityID,dateTime,ask1,ask1Volume,bid1,bid1Volume,ask2,ask2Volume,bid2,bid2Volume,ask3,ask3Volume,bid3,bid3Volume,tradePrice,tradeBid,isTrade
2318276,2017-11-20 08:00:09.052240,12869.0,1,12868.0,3,12870.0,19,12867.5,2,12872.5,2,12867.0,1,0.0,0,0
2318276,2017-11-20 08:00:09.052260,12869.0,1,12868.0,3,12870.0,19,12867.5,2,12872.5,2,12867.0,1,12861.0,1,1
2318276,2017-11-20 08:00:09.052260,12869.0,1,12868.0,2,12870.0,19,12867.5,2,12872.5,2,12867.0,1,12868.0,1,0
2318276,2017-11-20 08:00:09.052270,12869.0,1,12868.0,2,12870.0,19,12867.5,2,12872.5,2,12867.0,1,12868.0,1,1
2318276,2017-11-20 08:00:09.052270,12869.0,1,12868.0,1,12870.0,19,12867.5,2,12872.5,2,12867.0,1,12868.0,1,0
2318276,2017-11-20 08:00:09.052282,12869.0,1,12868.0,1,12870.0,19,12867.5,2,12872.5,2,12867.0,1,12868.0,1,1
2318276,2017-11-20 08:00:09.052282,12869.0,1,12867.5,2,12870.0,19,12867.0,1,12872.5,2,12865.5,1,12868.0,1,0
2318276,2017-11-20 08:00:09.052291,12869.0,1,12867.5,2,12870.0,19,12867.0,1,12872.5,2,12865.5,1,12867.5,1,1
2318276,2017-11-20 08:00:09.052291,12869.0,1,12867.5,1,12870.0,19,12867.0,1,12872.5,2,12865.5,1,12867.5,1,0
2318276,2017-11-20 08:00:09.052315,12869.0,1,12867.5,1,12870.0,19,12867.0,1,12872.5,2,12865.5,1,12867.5,1,1
2318276,2017-11-20 08:00:09.052315,12869.0,1,12867.5,1,12870.0,19,12867.0,1,12872.5,2,12865.5,1,12867.0,1,1
2318276,2017-11-20 08:00:09.052315,12869.0,1,12867.5,1,12870.0,19,12867.0,1,12872.5,2,12865.5,1,12865.5,1,1
2318276,2017-11-20 08:00:09.052315,12869.0,1,12867.5,1,12870.0,19,12867.0,1,12872.5,2,12865.5,1,12865.0,1,1
2318276,2017-11-20 08:00:09.052315,12869.0,1,12867.5,1,12870.0,19,12867.0,1,12872.5,2,12865.5,1,12864.0,1,1
2318276,2017-11-20 08:00:09.052315,12869.0,1,12861.5,2,12870.0,19,12861.0,1,12872.5,2,12860.0,5,12864.0,1,0
2318276,2017-11-20 08:00:09.052335,12869.0,1,12861.5,2,12870.0,19,12861.0,1,12872.5,2,12860.0,5,12861.5,1,1
2318276,2017-11-20 08:00:09.052335,12869.0,1,12861.5,1,12870.0,19,12861.0,1,12872.5,2,12860.0,5,12861.5,1,0
2318276,2017-11-20 08:00:09.052348,12869.0,1,12861.5,1,12870.0,19,12861.0,1,12872.5,2,12860.0,5,12861.5,1,1
2318276,2017-11-20 08:00:09.052348,12869.0,1,12861.0,1,12870.0,19,12860.0,5,12872.5,2,12859.5,3,12861.5,1,0
2318276,2017-11-20 08:00:09.052357,12869.0,1,12861.0,1,12870.0,19,12860.0,5,12872.5,2,12859.5,3,12861.0,1,1
2318276,2017-11-20 08:00:09.052357,12869.0,1,12860.0,5,12870.0,19,12859.5,3,12872.5,2,12858.0,1,12861.0,1,0
2318276,2017-11-20 08:00:09.052365,12869.0,1,12860.0,5,12870.0,19,12859.5,3,12872.5,2,12858.0,1,12860.0,1,1
2318276,2017-11-20 08:00:09.052365,12869.0,1,12860.0,4,12870.0,19,12859.5,3,12872.5,2,12858.0,1,12860.0,1,0
2318276,2017-11-20 08:00:09.052450,12869.0,1,12860.0,4,12870.0,19,12859.5,3,12872.5,2,12858.0,1,12860.0,4,1
2318276,2017-11-20 08:00:09.052450,12869.0,1,12860.0,4,12870.0,19,12859.5,3,12872.5,2,12858.0,1,12859.5,3,1
2318276,2017-11-20 08:00:09.052450,12869.0,1,12858.0,1,12870.0,19,12857.5,10,12872.5,2,12857.0,3,12859.5,3,0
2318276,2017-11-20 08:00:09.052560,12868.5,1,12858.0,1,12869.0,1,12857.5,10,12870.0,19,12857.0,3,12859.5,3,0
2318276,2017-11-20 08:00:09.052690,12868.5,1,12866.0,1,12869.0,1,12858.0,1,12870.0,19,12857.5,10,12859.5,3,0
2318276,2017-11-20 08:00:09.052693,12868.5,1,12866.0,1,12869.0,1,12865.5,1,12870.0,19,12858.0,1,12859.5,3,0
2318276,2017-11-20 08:00:09.052698,12868.5,1,12866.0,1,12869.0,1,12865.5,1,12870.0,19,12865.0,1,12859.5,3,0
2318276,2017-11-20 08:00:09.052914,12869.0,1,12866.0,1,12870.0,19,12865.5,1,12872.5,2,12865.0,1,12859.5,3,0
2318276,2017-11-20 08:00:09.052973,12869.0,1,12866.0,1,12869.5,1,12865.5,1,12870.0,19,12865.0,1,12859.5,3,0
2318276,2017-11-20 08:00:09.052976,12869.0,1,12866.0,1,12869.5,1,12865.5,1,12870.0,20,12865.0,1,12859.5,3,0
2318276,2017-11-20 08:00:09.053321,12867.5,1,12866.0,1,12869.0,1,12865.5,1,12869.5,1,12865.0,1,12859.5,3,0
2318276,2017-11-20 08:00:09.053324,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12869.0,1,12865.0,1,12859.5,3,0
2318276,2017-11-20 08:00:09.053329,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12859.5,3,0
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12866.0,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12865.5,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12865.0,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12864.5,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12864.0,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12863.5,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12863.0,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12862.5,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12862.0,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12861.5,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12861.0,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12860.5,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12860.0,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12859.5,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12859.0,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12858.5,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12858.0,2,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12857.5,11,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12857.0,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12857.0,3,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12856.5,2,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12856.0,3,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12866.0,1,12868.0,1,12865.5,1,12868.5,1,12865.0,1,12856.0,1,1
2318276,2017-11-20 08:00:09.053505,12867.5,1,12855.5,1,12868.0,1,12855.0,4,12868.5,1,12854.5,5,12856.0,1,0
2318276,2017-11-20 08:00:09.053515,12867.0,1,12855.5,1,12867.5,1,12855.0,4,12868.0,1,12854.5,5,12856.0,1,0
2318276,2017-11-20 08:00:09.053540,12867.0,1,12855.5,1,12867.5,1,12855.0,4,12868.0,3,12854.5,5,12856.0,1,0
2318276,2017-11-20 08:00:09.053560,12866.5,1,12855.5,1,12867.0,1,12855.0,4,12867.5,1,12854.5,5,12856.0,1,0
2318276,2017-11-20 08:00:09.053595,12866.5,1,12855.5,1,12867.0,2,12855.0,4,12867.5,1,12854.5,5,12856.0,1,0
2318276,2017-11-20 08:00:09.053914,12866.5,1,12862.5,1,12867.0,2,12855.5,1,12867.5,1,12855.0,4,12856.0,1,0
2318276,2017-11-20 08:00:09.053921,12866.5,1,12855.5,1,12867.0,2,12855.0,4,12867.5,1,12854.5,5,12856.0,1,0
2318276,2017-11-20 08:00:09.053982,12866.0,1,12855.5,1,12866.5,1,12855.0,4,12867.0,2,12854.5,5,12856.0,1,0
2318276,2017-11-20 08:00:09.054125,12863.0,1,12855.5,1,12866.0,1,12855.0,4,12866.5,1,12854.5,5,12856.0,1,0
2318276,2017-11-20 08:00:09.055129,12863.0,1,12855.5,1,12866.0,1,12855.0,4,12866.5,2,12854.5,5,12856.0,1,0
2318276,2017-11-20 08:00:09.055276,12862.5,1,12855.5,1,12863.0,1,12855.0,4,12866.0,1,12854.5,5,12856.0,1,0
2318276,2017-11-20 08:00:09.055326,12862.5,1,12856.0,1,12863.0,1,12855.5,1,12866.0,1,12855.0,4,12856.0,1,0
2318276,2017-11-20 08:00:09.056402,12862.5,1,12859.5,1,12863.0,1,12856.0,1,12866.0,1,12855.5,1,12856.0,1,0
2318276,2017-11-20 08:00:09.056409,12862.5,1,12859.5,1,12863.0,1,12859.0,1,12866.0,1,12856.0,1,12856.0,1,0
2318276,2017-11-20 08:00:09.056413,12862.5,1,12859.5,1,12863.0,1,12859.0,1,12866.0,1,12858.5,1,12856.0,1,0
2318276,2017-11-20 08:00:09.069027,12862.5,1,12859.5,1,12863.0,1,12859.0,1,12866.0,1,12858.5,1,12862.5,1,1
2318276,2017-11-20 08:00:09.095712,12862.5,1,12859.5,1,12863.0,1,12859.0,1,12866.0,1,12858.5,1,12862.5,1,1
2318276,2017-11-20 08:00:09.096094,12862.5,1,12859.5,1,12863.0,1,12859.0,1,12866.0,1,12858.5,1,12862.5,1,1
2318276,2017-11-20 08:00:09.096808,12862.5,1,12859.5,1,12863.0,1,12859.0,1,12866.0,1,12858.5,1,12862.5,1,1
2318276,2017-11-20 08:00:09.106377,12862.5,1,12859.5,1,12863.0,1,12859.0,1,12866.0,1,12858.5,1,12862.5,1,1
2318276,2017-11-20 08:00:09.113108,12862.5,1,12859.5,1,12863.0,1,12859.0,4,12866.0,1,12858.5,1,12862.5,1,0
2318276,2017-11-20 08:00:09.116699,12862.5,1,12859.5,1,12863.0,1,12859.0,4,12866.0,1,12858.5,1,12862.5,5,1
2318276,2017-11-20 08:00:09.169364,12862.5,1,12859.5,2,12863.0,1,12859.0,4,12866.0,1,12858.5,1,12862.5,5,0
2318276,2017-11-20 08:00:09.192889,12862.5,1,12859.5,2,12863.0,1,12859.0,4,12866.0,1,12858.5,1,12862.5,4,1
2318276,2017-11-20 08:00:09.280017,12862.5,1,12859.5,2,12863.0,1,12859.0,4,12866.0,1,12858.5,1,12862.5,1,1
2318276,2017-11-20 08:00:09.303498,12862.5,1,12859.5,2,12863.0,1,12859.0,4,12866.0,1,12858.5,1,12859.5,2,1
2318276,2017-11-20 08:00:09.303498,12862.5,1,12859.5,2,12863.0,1,12859.0,4,12866.0,1,12858.5,1,12859.0,2,1
2318276,2017-11-20 08:00:09.303498,12862.5,1,12859.5,2,12863.0,1,12859.0,4,12866.0,1,12858.5,1,12859.0,1,1
2318276,2017-11-20 08:00:09.303498,12862.5,1,12859.0,1,12863.0,1,12858.5,1,12866.0,1,12858.0,1,12859.0,1,0
2318276,2017-11-20 08:00:09.303541,12862.5,1,12859.5,1,12863.0,1,12859.0,1,12866.0,1,12858.5,1,12859.0,1,0
2318276,2017-11-20 08:00:09.346945,12862.5,1,12859.5,1,12863.0,1,12859.0,3,12866.0,1,12858.5,1,12859.0,1,0
2318276,2017-11-20 08:00:09.346996,12862.5,1,12860.0,1,12863.0,1,12859.5,1,12866.0,1,12859.0,3,12859.0,1,0
2318276,2017-11-20 08:00:09.433446,12862.5,1,12860.0,1,12863.0,1,12859.5,1,12866.0,1,12859.0,3,12860.0,1,1
2318276,2017-11-20 08:00:09.433446,12862.5,1,12859.5,1,12863.0,1,12859.0,3,12866.0,1,12858.5,2,12860.0,1,0
2318276,2017-11-20 08:00:09.433752,12862.5,1,12860.0,1,12863.0,1,12859.5,1,12866.0,1,12859.0,3,12860.0,1,0
2318276,2017-11-20 08:00:09.510752,12862.5,1,12860.0,1,12863.0,1,12859.5,1,12866.0,1,12859.0,3,12860.0,1,1
2318276,2017-11-20 08:00:09.510752,12862.5,1,12859.5,1,12863.0,1,12859.0,3,12866.0,1,12858.5,2,12860.0,1,0
2318276,2017-11-20 08:00:09.511015,12862.5,1,12860.0,1,12863.0,1,12859.5,1,12866.0,1,12859.0,3,12860.0,1,0
2318276,2017-11-20 08:00:09.532805,12862.5,1,12860.0,2,12863.0,1,12859.5,1,12866.0,1,12859.0,3,12860.0,1,0
2318276,2017-11-20 08:00:09.542079,12862.5,1,12860.0,2,12863.0,1,12859.5,1,12866.0,1,12859.0,3,12862.5,1,1
Giladbi
  • 1,822
  • 3
  • 19
  • 34
  • Can you provide some sample input and expected output please? – cs95 Jan 28 '18 at 09:04
  • added the data sample – Giladbi Jan 28 '18 at 09:16
  • Is this what you want? `df[df.isTrade.eq(1)].groupby('dateTime').tradeBid.agg(['sum', 'count'])` – cs95 Jan 28 '18 at 09:18
  • If you get Syntax Errors, just manually type the command out yourself instead of copy pasting. – cs95 Jan 28 '18 at 09:19
  • as I wrote, 'count' working for 0 values as well and therefor it's function is wrong. I need to count only values above 0. I wrote it in the question. this is not a duplicate. – Giladbi Jan 28 '18 at 09:22
  • 1
    Just use `['sum', lambda x: (x > 0).sum()]`. It still kinda is, just that your specifications are a little different :) – cs95 Jan 28 '18 at 09:24
  • 1
    Worked. thanks. though I will be happy if you could explain. thanks! – Giladbi Jan 28 '18 at 09:30
  • 1
    `agg` lets you specify multiple aggregation functions in a list. The first is self explanatory. The second is an anonymous function that takes a series, finds all items that are greater than 0 (resulting in a mask of True/False), and just sums it. Since True = 1, and False = 0, summing the mask gives you what you want. – cs95 Jan 28 '18 at 09:31
  • 1
    thank you very much. now I got it. had no idea it's functioned as true/false – Giladbi Jan 28 '18 at 09:35
  • BTW the lambda is increasing the time need to calculate from 0.6 seconds to 6 seconds! is there any faster solution? – Giladbi Jan 28 '18 at 10:55
  • Based on what did you measure 0.5 before? This is literally the fastest solution, sort of defining a named function `def mycount(x): return (x > 0).sum()` and passing `mycount` instead of a lambda. – cs95 Jan 28 '18 at 10:57
  • no change between using the "def mycount" and "lambda". nay other idea how can this increase the time X10? – Giladbi Jan 28 '18 at 11:08
  • I'm still confused as to what you are comparing this to. `size`/`count` is not nearly as expensive as this is, and the comparison is unfair since they do different things. – cs95 Jan 28 '18 at 11:09
  • i compare it to nothing (before i had that count if) regular expressions like sum/count etc. don't cause so much time but here the function of count it is causing a lot of time. i am asking how this small task is so demanding – Giladbi Jan 28 '18 at 11:13
  • I don't know... I can't answer that because I have no clue about your data or your environment. But I can tell you this is the fastest way of doing it. I'd assume you have a lot of data otherwise I can't explain the cause for the time. – cs95 Jan 28 '18 at 11:17
  • csv data is big (90 mb) and on windows environment. but still it just became 10 times slower! – Giladbi Jan 28 '18 at 11:26

0 Answers0