0

Please see my for_stack dataframe data which was saved via for_stack.to_pickle('for_stack')

As per following image I need to add a new column showing the amount of time between Timestamp (09:30) and 'Gap Lower Closed First' which would be 403 minutes for row 1.

I need to only do this for the first row of each day at 09:30 as highlighted. I would ideally like a new dataframe showing only the 09:30 entry for each day if this is possible?

Thanks for any help.

for_stack image

I have tried using the following (incorrect code) for a timedelta but just get a NaT

data['tvalue'] = data.index
data['delta'] = (data['Gap Lower Closed first'] - data['tvalue'])

'Gap Lower Closed first' is dtype: datetime64[ns]

nipy
  • 5,138
  • 5
  • 31
  • 72

2 Answers2

2

You can use the pandas.TimeGrouper (i couldn't find the documentation for it) and the first aggregate.

Example:

In [26]: df = pandas.DataFrame(index=pd.date_range('2016-01-01T09:30:00', periods=10, freq='30t') + pd.date_range('2016-01-02T09:30:00', periods=10, freq='30t'), data={'a': np.random.randn(20)})
manage.py:1: FutureWarning: using '+' to provide set union with datetimelike Indexes is deprecated, use .union()
  #!/usr/bin/env python


In [27]: df
Out[27]: 
                            a
2016-01-01 09:30:00 -0.693846
2016-01-01 10:00:00  1.627871
2016-01-01 10:30:00 -0.157882
2016-01-01 11:00:00  0.126959
2016-01-01 11:30:00 -0.865513
2016-01-01 12:00:00  0.042917
2016-01-01 12:30:00 -0.260965
2016-01-01 13:00:00  1.813741
2016-01-01 13:30:00 -1.108866
2016-01-01 14:00:00  1.030709
2016-01-02 09:30:00 -0.063701
2016-01-02 10:00:00 -0.695245
2016-01-02 10:30:00 -0.945378
2016-01-02 11:00:00 -0.394078
2016-01-02 11:30:00  2.005444
2016-01-02 12:00:00  0.920097
2016-01-02 12:30:00  0.329173
2016-01-02 13:00:00  1.951834
2016-01-02 13:30:00 -2.143820
2016-01-02 14:00:00 -0.357149

In [28]: df.groupby(pd.TimeGrouper(freq='1d')).first()
Out[28]: 
                   a
2016-01-01 -0.693846
2016-01-02 -0.063701

In your case, you can do

dfg = df.groupby(pd.TimeGrouper(freq='1d')).first()
dfg['delta'] = dfg['Gap Lower closed first'] - dfg.index
Guillaume Thomas
  • 2,220
  • 2
  • 24
  • 33
1

This I think does what you want:

import pandas as pd
import numpy as np
import datetime

data = {'t1':[datetime.datetime(2014, 3, 10, 9, 30, 0), 
              datetime.datetime(2014, 3, 10, 8, 33, 0)], 
        't2':[datetime.datetime(2014, 3, 11, 10, 34, 0), 
              datetime.datetime(2014, 3, 10, 11, 41, 9)]
        }

df = pd.DataFrame(data)
df = df.set_index('t1')
df['t_diff'] = df.t2 - df.index

In [15]: df
Out[15]:
                                     t2          t_diff
t1
2014-03-10 09:33:00 2014-03-11 10:34:00 1 days 01:01:00
2014-03-10 08:33:00 2014-03-10 11:41:09 0 days 03:08:09


df930 = df[np.logical_and(df.index.hour == 9, df.index.minute == 30)]

In [24]: df930
Out[24]:
                                     t2          t_diff
t1
2014-03-10 09:30:00 2014-03-11 10:34:00 1 days 01:04:00

I always use np.logical_and on multiple conditions like that, because it interprets it more broadly than I usually mean if I just use and, as in this: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Community
  • 1
  • 1
Jeff
  • 2,158
  • 1
  • 16
  • 29