As I wrote in the comment, as far as I can tell, you're not looking for a groupby, but rather some operation on each row.
I came up with the following solution using apply:
s = pd.to_datetime(df["pickup_datetime"]) # make sure the pickup column is datetime
r = s.apply(lambda x: np.sum(s.between(x, x + pd.Timedelta("1hr")) & (s.dt.dayofyear == x.dayofyear)))
Lets break it down:
This will go over each row (apply
) and create a Boolean mask based on two conditions:
- All the pickup times that fall within an hour from the current pickup time.
- All the pickup times that are within the same date (day of the year) as the current pickup time.
We then combine them with an AND operation (&).
This have created a boolean array the size of your Series with True
wherever both conditions are met.
Finally, we sum (np
for Numpy) this boolean array which is equivalent to counting the number of entries that met the condition above.