I've been working on a little project to sift through backup job reports and was using a couple of Pandas Dataframs to attempt to accomplish this. Now, I understand that manually iterating through a dataframe isn't ideal - but I'm not sure if there's a better method considering what I'm trying to achieve. I'm also rather new to Python and Pandas in general.
So I have a list of backup jobs, uniquely defined by their backup type, system name, and system type. This list is stored in objects_df.
Next, I have a list of every job run and its success or failure status. This is stored in report_df and is a few thousand rows.
I want to 'flag' a job in report_df once it has failed 3 consecutive times. I only care about 'flagging' the first time that happens.
And here's the ugly piece of code I'm trying to use to make this all happen:
for index, row in objects_df.iterrows():
count = 0
for index2, row in report_df.iterrows():
if(objects_df['Task Type'][index].equals(report_df['Task Type'][index2]) and objects_df['Object Name'][index].equals(report_df['Object Name'][index2]) and objects_df['Object Type'][index].equals(report_df['Object Type'][index2])):
if(row['Task Status'] == "Succeeded" and count < 3):
count = 0
if(row['Task Status'] == "Failed"):
count += 1
if(count == 3):
report_df.at[index2, 'flag'] = True
break
As you can see, it goes through my list of objects and for every match it finds in the report, it starts counting failures. Once it reaches 3 consecutive failures it flags it in the report_df.
Unfortunately, this code has been giving me all sorts of problems and doesn't really seem to mark the expected rows as 'flag = true'. Instead, I'm getting duplicate rows and flags on successful runs getting marked as 'true'.
Is there a way to clean this code up or perhaps a better way to approach this project altogether? Even if I do get this working, I'm afraid it'll take forever to go through all the data.
Thanks!