1

I'd like copy dataframe df into df2 with 5 mins interval, instead of 1 min.

How to extract the row by a particular interview?

>>> df = pd.read_csv('sample.txt', names=['Time', 'Index', 'Volume'], delim_whitespace=True)
>>> df.head(20)
                Time   Index  Volume
0   2019/05/16-09:15  27830    2031
1   2019/05/16-09:16  27815     995
2   2019/05/16-09:17  27829     961
3   2019/05/16-09:18  27848     663
4   2019/05/16-09:19  27873     869
5   2019/05/16-09:20  27847     854
6   2019/05/16-09:21  27828     784
7   2019/05/16-09:22  27813     676
8   2019/05/16-09:23  27828     700
9   2019/05/16-09:24  27849     665
10  2019/05/16-09:25  27851     487
11  2019/05/16-09:26  27859     451
12  2019/05/16-09:27  27874     548
13  2019/05/16-09:28  27898    1105
14  2019/05/16-09:29  27907     832
15  2019/05/16-09:30  27897    1470

My expected dataframe

>>> df2
                Time   Index  Volume
0   2019/05/16-09:15  27830    2031
5   2019/05/16-09:20  27847     854
10  2019/05/16-09:25  27851     487
15  2019/05/16-09:30  27897    1470
.....
idiot one
  • 314
  • 1
  • 4
  • 11
  • you can use splicing. see this https://stackoverflow.com/questions/10665889/how-to-take-column-slices-of-dataframe-in-pandas – WhySoSerious May 17 '19 at 08:29

2 Answers2

3

Use DataFrame.asfreq working with DatetimeIndex, so first convert column to datetime and then DataFrame.set_index:

df['Time'] = pd.to_datetime(df['Time'])
df = df.set_index('Time').asfreq('5T').reset_index()
print (df)
                 Time  Index  Volume
0 2019-05-16 09:15:00  27830    2031
1 2019-05-16 09:20:00  27847     854
2 2019-05-16 09:25:00  27851     487
3 2019-05-16 09:30:00  27897    1470

If format of datetimes cannot be changed:

df.index = pd.to_datetime(df['Time'])
df = df.asfreq('5T').reset_index(drop=True)
print (df)
               Time  Index  Volume
0  2019/05/16-09:15  27830    2031
1  2019/05/16-09:20  27847     854
2  2019/05/16-09:25  27851     487
3  2019/05/16-09:30  27897    1470
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

I think you are looking for a groupby statement, there is a convienience method for datetime indexing .resample

 df.set_index('Time').resample('5T').last()

produces:

Out[156]:
0   2019/05/16-09:15  27830    2031
5   2019/05/16-09:20  27847     854
10  2019/05/16-09:25  27851     487
15  2019/05/16-09:30  27897    1470
Mark
  • 934
  • 1
  • 10
  • 25