-1

My goal:

I have a dataset that gets generated every day at random hours leading to the first row to start at a random time. I want to make this dataset start from the nearest midnight date. For example, if the date on the first row is 2022-05-09 15:00:00, I would have to slice the data to make it start from the nearest midnight, in this case: 2022-05-10 00:00:00

Here's what the dataset looks like:

data

What I have tried:

I had the idea of locating the index of the first occurrence of my desired timestamp and applying iloc to create the desired data set.

match_timestamp = "00:00:00"
[df[df.index.strftime("%H:%M:%S") == match_timestamp].first_valid_index()]

results: [Timestamp('2022-05-10 00:00:00')]

However, this would only result in extracting the timestamp where it first appears, and I would not be able to apply iloc to the row value. As of now, I'm stuck and can't think of a more elegant solution, which I'm sure exists.

I would be grateful if you could recommend a better method to this. Thank you in advance!

Here's the complete code to extract the df:

pip install ccxt

import pandas as pd
import ccxt

exchange = ccxt.okx({'options': {'defaultType': 'futures', 'enableRateLimit': True}})
markets = exchange.load_markets()

url = 'https://www.okex.com'
tickers = pd.DataFrame((requests.get(url+'/api/v5/market/tickers?instType=FUTURES').json())['data'])
tickers = tickers.drop('instType', axis=1)
futures_tickers = list(tickers['instId'])

symbol = 'LINK-USD-220930'
candlestick_chart= exchange.fetch_ohlcv(symbol, '1h', limit=500)
candlestick_df = pd.DataFrame(candlestick_chart)
candlestick_df.columns = ['date', 'open', 'high', 'low', 'close', 'volume']

candlestick_df['date'] = pd.to_datetime(candlestick_df['date'], unit='ms')
candlestick_df['date'] = candlestick_df['date'] + pd.Timedelta(hours=8)

df  = candlestick_df
df

The dictionary format: (as suggested)

{'open': {Timestamp('2022-05-09 15:00:00'): 9.742, Timestamp('2022-05-09 16:00:00'): 9.731, Timestamp('2022-05-09 17:00:00'): 9.743, Timestamp('2022-05-09 18:00:00'): 9.684, Timestamp('2022-05-09 19:00:00'): 9.206, Timestamp('2022-05-09 20:00:00'): 9.43, Timestamp('2022-05-09 21:00:00'): 9.316, Timestamp('2022-05-09 22:00:00'): 9.403, Timestamp('2022-05-09 23:00:00'): 9.215, Timestamp('2022-05-10 00:00:00'): 9.141}, 'high': {Timestamp('2022-05-09 15:00:00'): 9.835, Timestamp('2022-05-09 16:00:00'): 9.75, Timestamp('2022-05-09 17:00:00'): 9.788, Timestamp('2022-05-09 18:00:00'): 9.697, Timestamp('2022-05-09 19:00:00'): 9.465, Timestamp('2022-05-09 20:00:00'): 9.469, Timestamp('2022-05-09 21:00:00'): 9.515, Timestamp('2022-05-09 22:00:00'): 9.413, Timestamp('2022-05-09 23:00:00'): 9.308, Timestamp('2022-05-10 00:00:00'): 9.223}, 'low': {Timestamp('2022-05-09 15:00:00'): 9.699, Timestamp('2022-05-09 16:00:00'): 9.596, Timestamp('2022-05-09 17:00:00'): 9.674, Timestamp('2022-05-09 18:00:00'): 8.739, Timestamp('2022-05-09 19:00:00'): 9.11, Timestamp('2022-05-09 20:00:00'): 9.3, Timestamp('2022-05-09 21:00:00'): 9.208, Timestamp('2022-05-09 22:00:00'): 9.174, Timestamp('2022-05-09 23:00:00'): 9.035, Timestamp('2022-05-10 00:00:00'): 8.724}, 'close': {Timestamp('2022-05-09 15:00:00'): 9.725, Timestamp('2022-05-09 16:00:00'): 9.745, Timestamp('2022-05-09 17:00:00'): 9.682, Timestamp('2022-05-09 18:00:00'): 9.18, Timestamp('2022-05-09 19:00:00'): 9.426, Timestamp('2022-05-09 20:00:00'): 9.32, Timestamp('2022-05-09 21:00:00'): 9.397, Timestamp('2022-05-09 22:00:00'): 9.229, Timestamp('2022-05-09 23:00:00'): 9.152, Timestamp('2022-05-10 00:00:00'): 8.82}, 'volume': {Timestamp('2022-05-09 15:00:00'): 3663.0, Timestamp('2022-05-09 16:00:00'): 6603.0, Timestamp('2022-05-09 17:00:00'): 2855.0, Timestamp('2022-05-09 18:00:00'): 20084.0, Timestamp('2022-05-09 19:00:00'): 8972.0, Timestamp('2022-05-09 20:00:00'): 5551.0, Timestamp('2022-05-09 21:00:00'): 8218.0, Timestamp('2022-05-09 22:00:00'): 7651.0, Timestamp('2022-05-09 23:00:00'): 6935.0, Timestamp('2022-05-10 00:00:00'): 10409.0}}
Royce Anton Jose
  • 218
  • 1
  • 10
  • please add a [mcve] and provide your dataframe in plain text not images. – Umar.H May 21 '22 at 19:07
  • The complete dataset is shared as a google drive link. I hope that helps! – Royce Anton Jose May 21 '22 at 19:09
  • please do not provide external links, you need to provide the complete example in your post otherwise you'll be downvoted/closed. see the rules in the link I provided above. – Umar.H May 21 '22 at 19:09
  • Added complete code to extract the data frame since the external link is not helpful. – Royce Anton Jose May 21 '22 at 19:16
  • `cctx` is not from the standard lib, you're expecting others to install an unknown lib onto their machines. do this - `print(df.head(10).to_dict())` and paste it in your question and format it as code. – Umar.H May 21 '22 at 19:17
  • also please provide what your dataframe output should look like. read this [how to make a good pandas question/example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Umar.H May 21 '22 at 19:18

1 Answers1

1

my minimalistic approach of the pandaNewstarter you simply can apply it to your candlestick_df:

import pandas as pd
import datetime
df = pd.read_csv("data.csv")
df.dtypes
# convert date column to dtype timestamp
df.date = pd.to_datetime(df.date) 

# get min value from date colum
min_date = df.date.min()

# from min get next day midnight timestamp value
NextDay_Date = (min_date + datetime.timedelta(days=1)).replace(hour=0, minute=0, second=0, microsecond=0)
NextDay_Date

# create new DateFrame by slicing original
df2 = df[df.date >= NextDay_Date].copy()

out:

print(NextDay_Date)
2022-05-10 00:00:00
print(df2)
                   date   open   high    low  close   volume
9   2022-05-10 00:00:00  9.141  9.223  8.724  8.820  10409.0
10  2022-05-10 01:00:00  8.755  8.979  8.558  8.832  11522.0
11  2022-05-10 02:00:00  8.815  8.880  8.304  8.593  20969.0
12  2022-05-10 03:00:00  8.618  8.720  8.370  8.610  15794.0
13  2022-05-10 04:00:00  8.610  8.929  8.610  8.736   9410.0
..                  ...    ...    ...    ...    ...      ...
NoobVB
  • 989
  • 6
  • 10
  • Thank you so much!! Can you also tell me how you paste your output results as tables on StackOverflow? For some strange reasons, my markdown tables are not showing the way they should and are leading to downvotes. Thank you once again! – Royce Anton Jose May 21 '22 at 19:33
  • 1
    You are welcome, as per code display. I use PyCharm, - I simply copy/paste results>on Stacks highlight them and mark them as a code using `{}` icon. – NoobVB May 21 '22 at 19:36
  • Thank you for that! I will do that next time! – Royce Anton Jose May 21 '22 at 19:38