I need to go through each row in a dataframe looking at a column called 'source_IP_address', and look at the previous 100 rows, so that I can find if there are any rows with the same 'source_IP_address' and where another column states 'authentication failure'.
I have written some code that does this, as I couldn't use Pandas rolling over two columns. Problem is, it is not very fast and I want to know if there is a better way to do it?
function to find in the previous window of n rows, the number of matching axis values, together with number of attribute values in the attribute column
def check_window_for_match(df_w, window_size, axis_col, attr_col, attr_var):
l = []
n_rows = df_w.shape[0]
for i in range(n_rows):
# create a temp dataframe with the previous n rows including current row
temp_df = df_w.iloc[i-(window_size-1):i+1]
#print(temp_df.shape)
# assign the current 'B' value as the axis variable
current_value = df_w[axis_col].iloc[i]
#print(current_value)
#print(temp_df)
# given the temp dataframe of previous window of n_rows, check axis matches against fails
l_temp = temp_df.loc[(temp_df[axis_col] == current_value) & (temp_df[attr_col] == attr_var)].shape[0]
l.append(l_temp)
return l
e.g.
df_test = pd.DataFrame({'B': [0, 1, 2, np.nan, 4, 6, 7, 8, 10, 8, 7], 'C': [2, 10, 'fail', np.nan, 6, 7, 8, 'fail', 8, 'fail', 9]})
df_test
matches_list = check_window_for_match(df_test, window_size=3, axis_col='B', attr_col='C', attr_var='fail')
output: [0, 0, 1, 0, 0, 0, 0, 1, 0, 2, 0]
I want to know if my code is correct and if it is the best way to do it, or there is a better alternative.