How to use groupby() with between_time()?

Question

I have a DataFrame and want to multiply all values in a column a for a certain day with the value of a at 6h00m00 of that day. If there is no 6h00m00 entry, that day should stay unchanged.

The code below unfortunately gives an error.

How do I have to correct this code / replace it with any working solution?

import pandas as pd
import numpy as np

start = pd.Timestamp('2000-01-01')
end = pd.Timestamp('2000-01-03')
t = np.linspace(start.value, end.value, 9)
datetime1 = pd.to_datetime(t)
df = pd.DataFrame( {'a':[1,3,4,5,6,7,8,9,14]})
df['date']= datetime1
print(df)

def myF(x):
    y = x.set_index('date').between_time('05:59', '06:01').a
    return y


toMultiplyWith =  df.groupby(df.date.dt.floor('D')).transform(myF)

.

    a                date
0   1 2000-01-01 00:00:00
1   3 2000-01-01 06:00:00
2   4 2000-01-01 12:00:00
3   5 2000-01-01 18:00:00
4   6 2000-01-02 00:00:00
5   7 2000-01-02 06:00:00
6   8 2000-01-02 12:00:00
7   9 2000-01-02 18:00:00
8  14 2000-01-03 00:00:00
....
AttributeError: ("'Series' object has no attribute 'set_index'", 'occurred at index a')

score 0 · Answer 1 · answered Nov 24 '20 at 21:26

you should change this line:

toMultiplyWith = df.groupby(df.date.dt.floor('D')).transform(myF)

to this:

toMultiplyWith = df.groupby(df.date.dt.floor('D')).apply(myF)

using .apply instead of .transform will give you the desired result.

apply is the right choice here since it implicitly passes all the columns for each group as a DataFrame to the custom function.

to read more about the difference between the two methods, consider this answer

score 0 · Answer 2 · answered Nov 24 '20 at 21:43

If you stick to use between_times(...) function, that would be the way to do it:

df = df.set_index('date') 
mask = df.between_time('05:59', '06:01').index
df.loc[mask, 'a'] = df.loc[mask, 'a'] ** 2 # the operation you want to perform
df.reset_index(inplace=True)

Outputs:

                 date   a
0 2000-01-01 00:00:00   1
1 2000-01-01 06:00:00   9
2 2000-01-01 12:00:00   4
3 2000-01-01 18:00:00   5
4 2000-01-02 00:00:00   6
5 2000-01-02 06:00:00  49
6 2000-01-02 12:00:00   8
7 2000-01-02 18:00:00   9
8 2000-01-03 00:00:00  14

score 0 · Answer 3 · answered Nov 24 '20 at 22:01

If I got your goal right, you can use apply to return a dataframe with the same amount of rows as the original dataframe (simulating a transform):

def myF(grp):
    time = grp.date.dt.strftime('%T')
    target_idx = time == '06:00:00'
    if target_idx.any():
        grp.loc[~target_idx, 'a_sum'] = grp.loc[~target_idx, 'a'].values * grp.loc[target_idx, 'a'].values
    else:
        grp.loc[~target_idx, 'a_sum'] = np.nan
    return grp

df.groupby(df.date.dt.floor('D')).apply(myF)

Output:

    a                date  a_sum
0   1 2000-01-01 00:00:00    3.0
1   3 2000-01-01 06:00:00    NaN
2   4 2000-01-01 12:00:00   12.0
3   5 2000-01-01 18:00:00   15.0
4   6 2000-01-02 00:00:00   42.0
5   7 2000-01-02 06:00:00    NaN
6   8 2000-01-02 12:00:00   56.0
7   9 2000-01-02 18:00:00   63.0
8  14 2000-01-03 00:00:00    NaN

See that, for each day, each value with time other than 06:00:00 is multiplied by the value with time equals 06:00:00. It retuns NaN for the 06:00:00-values themselves, as well as for the groups without this time.

How to use groupby() with between_time()?

3 Answers3