0

I have a df containing minute bars of different symbols like so:

                       timestamp    open    high      low   close  volume  trade_count        vwap symbol
0      2021-10-13 08:00:00+00:00  140.20  140.40  140.000  140.40    6084           65  140.205417   AAPL
1      2021-10-13 08:01:00+00:00  140.35  140.40  140.200  140.40    3052           58  140.308182   AAPL
2      2021-10-13 08:02:00+00:00  140.35  140.35  140.350  140.35     632           30  140.320934   AAPL
3      2021-10-13 08:03:00+00:00  140.28  140.30  140.200  140.20    2867           36  140.279473   AAPL
4      2021-10-13 08:04:00+00:00  140.20  140.20  140.200  140.20     435           36  140.199195   AAPL
...                          ...     ...     ...      ...     ...     ...          ...         ...    ...
58250  2021-10-27 19:58:00+00:00  209.31  209.33  209.215  209.26   26440          348  209.251852    ZTS
58251  2021-10-27 19:59:00+00:00  209.28  209.59  209.010  209.56  109758         1060  209.384672    ZTS
58252  2021-10-27 20:03:00+00:00  209.58  209.58  209.580  209.58  537786           49  209.580000    ZTS
58253  2021-10-27 20:05:00+00:00  209.58  209.58  209.580  209.58    4170            1  209.580000    ZTS
58254  2021-10-27 20:12:00+00:00  209.58  209.58  209.580  209.58     144            1  209.580000    ZTS

[58255 rows x 9 columns]

I want to be able to use df.groupby so I can loop over each of the days of each ticker. Something like:

                       timestamp    open    high      low   close  volume  trade_count        vwap symbol
0      2021-10-13 08:00:00+00:00  140.20  140.40  140.000  140.40    6084           65  140.205417   AAPL
1      2021-10-13 08:01:00+00:00  140.35  140.40  140.200  140.40    3052           58  140.308182   AAPL
2      2021-10-13 08:02:00+00:00  140.35  140.35  140.350  140.35     632           30  140.320934   AAPL
3      2021-10-13 08:03:00+00:00  140.28  140.30  140.200  140.20    2867           36  140.279473   AAPL
4      2021-10-13 08:04:00+00:00  140.20  140.20  140.200  140.20     435           36  140.199195   AAPL



                       timestamp    open    high      low   close  volume  trade_count        vwap symbol
0      2021-10-14 08:00:00+00:00  140.20  140.40  140.000  140.40    6084           65  140.205417   AAPL
1      2021-10-14 08:01:00+00:00  140.35  140.40  140.200  140.40    3052           58  140.308182   AAPL
2      2021-10-14 08:02:00+00:00  140.35  140.35  140.350  140.35     632           30  140.320934   AAPL
3      2021-10-14 08:03:00+00:00  140.28  140.30  140.200  140.20    2867           36  140.279473   AAPL
4      2021-10-14 08:04:00+00:00  140.20  140.20  140.200  140.20     435           36  140.199195   AAPL

How can I do this?

Someone suggested I look at another question:

table = df.groupby(pd.Grouper(key='timestamp', axis=0, freq='D')).sum()

But this takes the minute data and returns daily:

Name: 2022-04-04 00:00:00+00:00, dtype: float64)
(Timestamp('2022-04-05 00:00:00+0000', tz='UTC', freq='D'), open           0.0
high           0.0
low            0.0
close          0.0
volume         0.0
trade_count    0.0
vwap           0.0
Name: 2022-04-05 00:00:00+00:00, dtype: float64)
(Timestamp('2022-04-06 00:00:00+0000', tz='UTC', freq='D'), open            2000.818300
high            2001.724000
low             2000.563300
close           2001.462900
volume         59717.000000
trade_count      487.000000
vwap            2001.073115
Name: 2022-04-06 00:00:00+00:00, dtype: float64)

I need to take my minute data and split the minutes into separate days. I don't need to upscale to daily bars like was suggested here.

a7dc
  • 3,323
  • 7
  • 32
  • 50

3 Answers3

1

https://pandas.pydata.org/docs/user_guide/basics.html#by-values

I think the decision depends on the actual aim of looping over days plus symbols.

df = df.sort_values(by=["timestamp", "symbol"])
Raibek
  • 558
  • 3
  • 6
1

Are you looking for:

# ensure datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])

# group by symbol and date
out = df.groupby(['symbol', df['timestamp'].dt.date]).sum()
mozway
  • 194,879
  • 13
  • 39
  • 75
0

Thanks to Raibek and Mozway for their help! It was Mozway's answer which got me there:

df['timestamp'] = pd.to_datetime(df['timestamp'])

table = df.groupby(['symbol', df['timestamp'].dt.date]) # remove the sum


for ohlc in table:
    print(ohlc)

Which makes each day its own df:

(('QCOM', datetime.date(2021, 11, 22)),                       timestamp    open    high     low   close  volume  trade_count        vwap symbol
44960 2021-11-22 09:15:00+00:00  185.11  185.11  185.11  185.11     331            9  185.075559   QCOM
44961 2021-11-22 09:22:00+00:00  185.15  185.15  185.15  185.15     180            6  185.136722   QCOM
44962 2021-11-22 09:27:00+00:00  185.54  185.73  185.54  185.73     631           12  185.669271   QCOM
44963 2021-11-22 09:29:00+00:00  185.65  185.65  185.65  185.65     302            4  185.663510   QCOM
44964 2021-11-22 09:49:00+00:00  185.63  185.63  185.63  185.63     179            6  185.632626   QCOM
...                         ...     ...     ...     ...     ...     ...          ...         ...    ...
45499 2021-11-22 23:37:00+00:00  181.75  181.75  181.75  181.75     309            8  181.653786   QCOM
45500 2021-11-22 23:45:00+00:00  181.64  181.64  181.64  181.64     193            2  181.640000   QCOM
45501 2021-11-22 23:52:00+00:00  181.64  181.64  181.64  181.64     187            3  181.644064   QCOM
45502 2021-11-22 23:54:00+00:00  181.53  181.53  181.50  181.50    2000           12  181.511655   QCOM
45503 2021-11-22 23:55:00+00:00  181.50  181.50  181.50  181.50    5100            3  181.500000   QCOM

[544 rows x 9 columns])
(('QCOM', datetime.date(2021, 11, 23)),                       timestamp    open      high     low     close  volume  trade_count        vwap symbol
45504 2021-11-23 00:09:00+00:00  181.50  181.5000  181.50  181.5000    1250            5  181.502240   QCOM
45505 2021-11-23 00:10:00+00:00  181.50  181.5000  181.50  181.5000    1612            5  181.500000   QCOM
45506 2021-11-23 00:15:00+00:00  181.55  181.5500  181.55  181.5500     640            3  181.551563   QCOM
45507 2021-11-23 00:19:00+00:00  181.50  181.5000  181.50  181.5000     253            4  181.500000   QCOM
45508 2021-11-23 00:27:00+00:00  181.55  181.5500  181.55  181.5500     320            3  181.547500   QCOM
45509 2021-11-23 00:31:00+00:00  181.55  181.5500  181.55  181.5500     100            1  181.550000   QCOM
45510 2021-11-23 00:32:00+00:00  181.50  181.5000  181.50  181.5000     800            3  181.500000   QCOM
45511 2021-11-23 00:33:00+00:00  181.40  181.4000  181.40  181.4000    1361           10  181.400786   QCOM
45512 2021-11-23 00:34:00+00:00  181.33  181.3300  181.00  181.0000    5446           47  181.179884   QCOM
45513 2021-11-23 00:35:00+00:00  181.00  181.0000  180.76  180.7600    2674           21  180.889443   QCOM
45514 2021-11-23 00:36:00+00:00  180.77  180.9900  180.77  180.9900     241            4  180.860871   QCOM
45515 2021-11-23 00:38:00+00:00  180.77  181.1899  180.77  181.1899     300            5  180.979033   QCOM
45516 2021-11-23 00:39:00+00:00  180.75  180.7500  180.75  180.7500     238            3  180.751597   QCOM)
a7dc
  • 3,323
  • 7
  • 32
  • 50