I have transaction data that I hope to resample in a fashion similar to OHLC stock market prices.
- The goal is to display a meaningful price summary for each day
However, my challenge is that the transactional data is sparse.
There are days without transactions
The opening price of the day might happen in the middle of the day and does not roll over from the previous day automatically
I can do the naive OHLC with resample()
, as seen in the example below. Because the data is sparse, the normal straightforward resample gives unideal results. To get a meaningful price sample, the following conditions should be met too:
The opening price is always set to the closing price of the previous day
If any day does not have transactions, all OHLC values are the closing price of the previous day ("price does not move")
I can do this in pure Python, as it is not that difficult, but it is not very computationally efficient for high volumes of data. Thus, my question is, would Pandas offer any clever way of doing resample
or aggregate
satisfying the conditions above, but without needing to loop values in Python manually?
The example code is below:
import pandas as pd
# Transactions do not have regular intervals and may miss days
data = {
"timestamp": [
pd.Timestamp("2020-01-01 01:00"),
pd.Timestamp("2020-01-01 05:00"),
pd.Timestamp("2020-01-02 03:00"),
pd.Timestamp("2020-01-04 04:00"),
pd.Timestamp("2020-01-05 00:00"),
],
"transaction": [
100.00,
102.00,
103.00,
102.80,
99.88
]
}
df = pd.DataFrame.from_dict(data, orient="columns")
df.set_index("timestamp", inplace=True)
print(df)
transaction
timestamp
2020-01-01 01:00:00 100.00
2020-01-01 05:00:00 102.00
2020-01-02 03:00:00 103.00
2020-01-04 04:00:00 102.80
2020-01-05 00:00:00 99.88
# https://stackoverflow.com/a/36223274/315168
naive_resample = df["transaction"].resample("1D") .agg({'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last'})
print(naive_resample)
In this result, you can see that:
open/close do not match over daily boundaries
if a day does not have transactions price is marked as
NaN
open high low close
timestamp
2020-01-01 100.00 102.00 100.00 102.00
2020-01-02 103.00 103.00 103.00 103.00
2020-01-03 NaN NaN NaN NaN
2020-01-04 102.80 102.80 102.80 102.80
2020-01-05 99.88 99.88 99.88 99.88