0

Here is a Dataframe which has OHLC minute wise data from 2011-2021

I want to make another column named "first15 high" where I want to have first 15 minute high i.e. 9:15 to 9:30 highest high for that day.

The desired output (in the yellow column )is below. The Dataframe has more than 10 years of data (i.e. contains more than 2000 days).

enter image description here

Joseph arasta
  • 161
  • 1
  • 3
  • 12
  • Can someone answer it ? – Joseph arasta May 26 '21 at 05:51
  • 1
    Please, read [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15239951) – Corralien May 26 '21 at 05:54

1 Answers1

0

The data you present is maintained in excel, but I will answer on the assumption that pandas is available. As a sample data, get the data from Yahoo Finance and first create the first five data frames grouped by year, month, and day. Group the created data frame by date and find the maximum value. Combine the data frame for which the maximum value was obtained with the original data frame. If you're looking for a quick answer, posting the data in text and providing the code you're working on is a must.

import pandas as pd
import yfinance as yf
df = yf.download("AAPL", interval='1m', start="2021-05-18", end="2021-05-25")
df.index = pd.to_datetime(df.index)
df.index = df.index.tz_localize(None)
df['date'] = df.index.date
# first 5 records by day
first_15min = df.groupby([df.index.year,df.index.month,df.index.day])['High'].head(15).to_frame()
# max value 
first_15min = first_15min.groupby([first_15min.index.date]).max()
df.merge(first_15min, left_on='date', right_on=first_15min.index, how='inner')

    Open    High_x  Low     Close   Adj Close   Volume  date    High_y
0   125.980003  126.099998  125.970001  126.065002  126.065002  0   2021-05-17  126.099998
1   126.060097  126.070000  125.900002  125.910004  125.910004  135988  2021-05-17  126.099998
2   125.900002  125.900002  125.790298  125.880096  125.880096  172001  2021-05-17  126.099998
3   125.889999  125.889999  125.790001  125.860001  125.860001  81338   2021-05-17  126.099998
4   125.870003  125.968201  125.870003  125.919998  125.919998  187059  2021-05-17  126.099998
...     ...     ...     ...     ...     ...     ...     ...     ...
1942    127.490097  127.557404  127.480003  127.540001  127.540001  161355  2021-05-24  126.419998
1943    127.540001  127.559998  127.480003  127.485001  127.485001  143420  2021-05-24  126.419998
1944    127.485001  127.529999  127.449997  127.480003  127.480003  132487  2021-05-24  126.419998
1945    127.479897  127.500000  127.449997  127.470001  127.470001  98478   2021-05-24  126.419998
1946    127.480003  127.550003  127.460098  127.532303  127.532303  128118  2021-05-24  126.419998
r-beginners
  • 31,170
  • 3
  • 14
  • 32
  • first_15min = df.groupby([df.index.year,df.index.month,df.index.day])['High'].head(5).to_frame() Here if any data is missing, it wont work. – Joseph arasta May 27 '21 at 09:49
  • If it is NA, you can use fillna() to set it to zero before grouping. – r-beginners May 27 '21 at 10:07
  • No, it will be complete missing row. For instance 9:20 am is missing completely. Here head(15) will include the last day some data too. I hope you got the point – Joseph arasta May 27 '21 at 10:09
  • I reviewed the code and for some reason it says head(5), but it is correct head(15). I fixed it. – r-beginners May 27 '21 at 10:17
  • Please point out any issues other than those that have been fixed. – r-beginners May 27 '21 at 10:19
  • When using head(15) we are assuming that at 9:30 AM we will always have 15 rows above (having same date) which might not be the case as some data might be missing and it might less than 15 rows. Can we sum the above rows with a condition that date is same ? – Joseph arasta May 27 '21 at 10:21
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/232945/discussion-between-joseph-arasta-and-r-beginners). – Joseph arasta May 27 '21 at 10:42
  • The idea is to perform a time-specific extraction on your data before grouping. For example, you can extract it like this. `df[df[hourminute].isin('9:30','9:31','9:32`...)` – r-beginners May 27 '21 at 12:05