I have this DataFrame.
High Close
Close Time
2022-10-23 21:41:59.999 19466.02 19461.29
2022-10-23 21:42:59.999 19462.48 19457.83
2022-10-23 21:43:59.999 19463.13 19460.09
2022-10-23 21:44:59.999 19465.15 19463.76
I'm attempting to check if Close
at a later date (up to 600 rows later but no more) goes above the close of an earlier date & High
is lower than the High of the same earlier date then I want to get the location of both the earlier and later date and make new columns in the Dataframe with those locations.
Expected output:
High Close LC HC HH LH
Close Time
2022-10-23 21:41:59.999 19466.02 19461.29 19461.29 NaN 19466.02 NaN
2022-10-23 21:42:59.999 19462.48 19457.83 NaN NaN NaN NaN
2022-10-23 21:43:59.999 19463.13 19460.09 NaN NaN NaN NaN
2022-10-23 21:44:59.999 19465.15 19463.76 NaN 19463.76 NaN 19465.15
This is the code I have tried
# Checking if conditions are met
for i in range(len(df)):
for a in range(i,600):
if (df.iloc[i:, 1] < df.iloc[a, 1]) & (df.iloc[i:, 0] > df.iloc[a, 0]):
# Creating new DataFrame columns
df['LC'] = df.iloc[i, 1]
df['HC'] = df.get_loc[i,1]
df['HH'] = df.get_loc[a, 0]
df['LH'] = df.get_loc[a, 0]
else:
continue
This line: if (df.iloc[i:, 1] < df.iloc[a, 1]) & (df.iloc[i:, 0] > df.iloc[a, 0]):
Is causing error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I believe I should be using any()
instead of an if statement but I am unsure of how to apply it. I also think that there many be an issue with the way I am using the df.get_loc[]
but I am unsure. I'm a pandas beginner so if it is obvious I apologize
Here is an image to help visualise what I am attempting to do using a candlestick chart
what I want to do is check if HC is higher than LC and LH is lower than HH then add that data to new columns in the DataFrame
Here is an additional way I tried to achieve the desired output
idx_close, idx_high = map(df.columns.get_loc, ["Close", "High"])
# Check Conditions
for i in range(len(df)):
bool_l = [((df.iloc[i, idx_close] < df.iloc[a, idx_close]) &
(df.iloc[i, idx_high] > df.iloc[a, idx_high])
).any() for a in range(i, 600)]
# Creating new DataFrame columns
df.loc[i, 'LC'] = df.iloc[i,1]
df.loc[bool_l, 'HC'] = df.iloc[bool_l, 1]
# Creating new DataFrame columns
df.loc[i, 'HH'] = df.iloc[i, 0]
df.loc[bool_l, 'LH'] = df.iloc[bool_l, 0]
And I get an error IndexError: Boolean index has wrong length: 600 instead of 2867
On the line df.loc[bool_l, 'HC'] = df.iloc[bool_l, 1]
I assume the error comes from the range(i,600)
but I don't know how to get around it