I have a data frame in pandas that is organized like so:
btc_price['btc_price'] = pd.to_numeric(btc_price['btc_price'].str.replace(',', ''))
btc_price.head(n=120)
Out[4]:
btc_price
time
2017-08-27 22:50:00 4,389.6113
2017-08-27 22:51:00 4,389.0850
2017-08-27 22:52:00 4,388.8625
2017-08-27 22:53:00 4,389.7888
2017-08-27 22:56:00 4,389.9138
2017-08-27 22:57:00 4,390.1663
2017-08-27 22:58:00 4,390.2600
2017-08-27 22:59:00 4,392.4013
2017-08-27 23:00:00 4,391.6588
2017-08-27 23:01:00 4,391.9213
2017-08-27 23:02:00 4,394.0113
2017-08-27 23:03:00 4,396.9713
2017-08-27 23:04:00 4,397.3350
2017-08-27 23:05:00 4,397.0700
2017-08-27 23:06:00 4,398.6188
2017-08-27 23:07:00 4,398.5725
2017-08-27 23:08:00 4,397.4713
2017-08-27 23:09:00 4,398.0938
2017-08-27 23:10:00 4,398.7775
2017-08-27 23:11:00 4,398.0200
2017-08-27 23:12:00 4,397.9513
2017-08-27 23:13:00 4,398.0613
2017-08-27 23:14:00 4,398.0900
2017-08-27 23:15:00 4,398.0063
2017-08-27 23:16:00 4,397.6088
2017-08-27 23:17:00 4,394.3763
2017-08-27 23:46:00 4,389.1100
2017-08-27 23:48:00 4,390.6763
2017-08-27 23:49:00 4,392.5388
2017-08-27 23:49:00 4,392.5388
...
2017-08-28 00:51:00 4,367.5738
2017-08-28 00:51:00 4,367.5738
2017-08-28 00:52:00 4,367.7888
2017-08-28 00:53:00 4,368.4188
2017-08-28 00:54:00 4,368.8225
2017-08-28 00:55:00 4,368.7438
2017-08-28 00:57:00 4,368.4700
2017-08-28 00:58:00 4,367.9963
2017-08-28 00:59:00 4,366.4750
2017-08-28 01:00:00 4,359.1988
2017-08-28 01:01:00 4,355.2825
2017-08-28 01:02:00 4,352.3675
2017-08-28 01:03:00 4,354.2188
2017-08-28 01:04:00 4,353.5263
2017-08-28 01:05:00 4,354.2488
2017-08-28 01:06:00 4,358.8063
2017-08-28 01:07:00 4,359.5738
2017-08-28 01:08:00 4,361.7313
2017-08-28 01:09:00 4,360.8638
2017-08-28 01:10:00 4,363.0750
2017-08-28 01:11:00 4,362.3375
2017-08-28 01:12:00 4,362.3338
2017-08-28 01:13:00 4,358.8000
2017-08-28 01:14:00 4,354.0463
2017-08-28 01:15:00 4,356.1950
2017-08-28 01:16:00 4,359.5975
2017-08-28 01:17:00 4,360.1588
2017-08-28 01:18:00 4,362.2338
2017-08-28 01:19:00 4,363.7900
2017-08-28 01:20:00 4,362.6150
I would like to create a new column change
that has a value of -1,0,1. These should correspond to a 5% decrease in price over the past hour (-1), "no change" (0), and a 5% increase in price over the past hour(1). Also, the value of one hour should be mutable so I could change it to one day or 30 minutes for example, as I see fit.
I found similar questions here and here but I am new to python and am not really sure how to apply these solutions specifically to my problem.
The other option is to calculate the avg price for each hour and then calculate the % change on an hourly basis, but I would prefer to be able to use a rolling timeframe.
I have also tried doing this in R with no luck. Please help.
I started by trying:
btc_price['change'] = btc_price.pct_change(periods=60, fill_method='pad', limit=None, freq=None)
This works but does not quite give me what I am looking for, I would like to compare each value to the min and max values for the past "time frame" and calculate the % change based on this value instead of simply comparing two rows.
What I want to end up with is something like this (incomplete):
# Calculate the % change in btc_price
def calc_change(df):
array = df.values
a = array[:,1]
# Apply % change comparison to timeframe
def rolling(df, period, func, min_periods = None):
if min_periods is None:
min_periods = period
result = pd.Series(np.nan, index = df.index)
for i in range(1, len(df) + 1):
sub_df = df.iloc[max(i)]
I believe I could use something like df.rolling()
found here but am not quite sure if this is exactly what I want because I dont quite understand how it works. An explanation would be great.