1

I want to make sub dataframes out of one dataframe, using its datetime index. For example, if I want to extract rows between 07:00~06:00 and make new dataframes:

import pandas as pd

int_rows  = 24
str_freq  = '180min'

i = pd.date_range('2018-04-09', periods=int_rows, freq=str_freq)

df = pd.DataFrame({'A': [i for i in range(int_rows)]}, index=i)

>>> df
                      A
2018-04-09 00:00:00   0
2018-04-09 03:00:00   1
2018-04-09 06:00:00   2
2018-04-09 09:00:00   3
2018-04-09 12:00:00   4
2018-04-09 15:00:00   5
2018-04-09 18:00:00   6
2018-04-09 21:00:00   7
2018-04-10 00:00:00   8
2018-04-10 03:00:00   9
2018-04-10 06:00:00  10
2018-04-10 09:00:00  11
2018-04-10 12:00:00  12
2018-04-10 15:00:00  13
2018-04-10 18:00:00  14
2018-04-10 21:00:00  15
2018-04-11 00:00:00  16
2018-04-11 03:00:00  17
2018-04-11 06:00:00  18
2018-04-11 09:00:00  19
2018-04-11 12:00:00  20
2018-04-11 15:00:00  21
2018-04-11 18:00:00  22
2018-04-11 21:00:00  23
# new dataframes that I want

                      A
2018-04-09 00:00:00   0
2018-04-09 03:00:00   1

                      A
2018-04-09 06:00:00   2
2018-04-09 09:00:00   3
2018-04-09 12:00:00   4
2018-04-09 15:00:00   5
2018-04-09 18:00:00   6
2018-04-09 21:00:00   7
2018-04-10 00:00:00   8
2018-04-10 03:00:00   9

                      A
2018-04-10 06:00:00  10
2018-04-10 09:00:00  11
2018-04-10 12:00:00  12
2018-04-10 15:00:00  13
2018-04-10 18:00:00  14
2018-04-10 21:00:00  15
2018-04-11 00:00:00  16
2018-04-11 03:00:00  17

                      A
2018-04-11 06:00:00  18
2018-04-11 09:00:00  19
2018-04-11 12:00:00  20
2018-04-11 15:00:00  21
2018-04-11 18:00:00  22
2018-04-11 21:00:00  23

I found between_time method, but it doesn't care about dates. I could iterate over the original dataframe and check each date and time, but I think it's going to be inefficient. Are there any simple ways to do this?

maynull
  • 1,936
  • 4
  • 26
  • 46

1 Answers1

1

You can 'shift' the timestamp by 6 hours and group by day:

for k, d in df.groupby((df.index - pd.to_timedelta('6:00:00')).normalize()):
    print(d); print()

Output:

                     A
2018-04-09 00:00:00  0
2018-04-09 03:00:00  1

                     A
2018-04-09 06:00:00  2
2018-04-09 09:00:00  3
2018-04-09 12:00:00  4
2018-04-09 15:00:00  5
2018-04-09 18:00:00  6
2018-04-09 21:00:00  7
2018-04-10 00:00:00  8
2018-04-10 03:00:00  9

                      A
2018-04-10 06:00:00  10
2018-04-10 09:00:00  11
2018-04-10 12:00:00  12
2018-04-10 15:00:00  13
2018-04-10 18:00:00  14
2018-04-10 21:00:00  15
2018-04-11 00:00:00  16
2018-04-11 03:00:00  17

                      A
2018-04-11 06:00:00  18
2018-04-11 09:00:00  19
2018-04-11 12:00:00  20
2018-04-11 15:00:00  21
2018-04-11 18:00:00  22
2018-04-11 21:00:00  23
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Thank you so much! You've surprisingly shortened my code! – maynull Nov 14 '19 at 04:02
  • A semicolon? `print()`? – AMC Nov 14 '19 at 04:05
  • @AlexanderCécile a semicolon so you don't have to do line break. `print()` just print blank line. – Quang Hoang Nov 14 '19 at 04:09
  • Why wouldn’t you want to write code on multiple lines? Why not skip a line by using a far more normal method, like `print(f’{d}\n’)`? – AMC Nov 14 '19 at 04:13
  • I find this way more readable and 'typable', even more than `print(d, '\n')`. But it certainly is a very personal preference. – Quang Hoang Nov 14 '19 at 04:15
  • https://stackoverflow.com/q/19365508/11301900. `print()` makes it look like the code is missing a part/unfinished, too..... – AMC Nov 14 '19 at 04:24