5

Using the yahoo finance package in python, I am able to download the relevant data to show OCHL. What I am aiming to do, is find which time during the day is when the stock is at its highest on average.

Here is the code to download the data:

import yfinance as yf
import pandas as pd

df = yf.download(
        tickers = "APPL",
        period = "60d",
        interval = "5m",
        auto_adjust = True,
        group_by = 'ticker',
        prepost = True,
    )

maxTimes = df.groupby([df.index.month, df.index.day, df.index.day_name()])['High'].idxmax()

This gives me something like this:

Datetime  Datetime  Datetime 
6         2         Tuesday     2020-06-02 19:45:00-04:00
          3         Wednesday   2020-06-03 15:50:00-04:00
          4         Thursday    2020-06-04 10:30:00-04:00
          5         Friday      2020-06-05 11:30:00-04:00
...
8         3         Monday      2020-08-03 14:40:00-04:00
          4         Tuesday     2020-08-04 18:10:00-04:00
          5         Wednesday   2020-08-05 11:10:00-04:00
          6         Thursday    2020-08-06 16:20:00-04:00
          7         Friday      2020-08-07 15:50:00-04:00
Name: High, dtype: datetime64[ns, America/New_York]

I think that the maxTimes object I have created should be giving me the time at which the high of the day occurred per day, however what I then need is:

Monday    12:00
Tuesday   13:25
Wednesday 09:35
Thurs     16:10
Fri       12:05

Is anyone able to help me identify how to get my data to look like this?

Ash
  • 53
  • 5

1 Answers1

2

This should work:

import yfinance as yf
import pandas as pd

df = yf.download(
        tickers = "AAPL",
        period = "60d",
        interval = "5m",
        auto_adjust = True,
        group_by = 'ticker',
        prepost = True,
    )

maxTimes = df.groupby([df.index.month, df.index.day, df.index.day_name()])['High'].idxmax()

# Drop date
maxTimes = maxTimes.apply(lambda x: x.time())

# Drop unused sub-indexes
maxTimes = maxTimes.droplevel(level=[0,1])

# To seconds
maxTimes = maxTimes.apply(lambda t: (t.hour * 60 + t.minute) * 60 + t.second)

# Get average
maxTimes =  maxTimes.groupby(maxTimes.index).mean()

# Back to time
maxTimes = pd.to_datetime(maxTimes, unit='s').apply(lambda x: x.time())

print (maxTimes)

'''
Output:

Datetime
Friday       11:59:32.727272
Monday              14:15:00
Thursday            13:21:40
Tuesday             10:35:00
Wednesday           11:53:45
Name: High, dtype: object

'''
M. Abreu
  • 366
  • 2
  • 5
  • You sir are a hero! Only question I have on top is that the timezone in the original items is eastern (utc -4) - do you know what timezone that output is in? – Ash Aug 25 '20 at 17:17
  • The same as the source data, so UTC-4. – M. Abreu Aug 25 '20 at 17:19