This post has been really helpful for getting the basis of what I want to do, however, I'm stuck with how to get to the finish line.
I have large dataframe (approx. 10k rows) with the first few rows looking like what I'll call df_a:
zone | value
0 | 12
1 | 12
2 99
3 12
0 12
1 12
2 12
3 99
I am looking to drop consecutive duplicates within 'value', however, based on the condition of zone. For example, in the above snippet I would want the second '12' to be dropped for zone = 1. So that I end up with:
zone | value
0 | 12
1 | 12
2 99
3 12
2 12
3 99
My initial idea was to use a loop across a list of zones, create new variables for each created zone automatically based on the zone name, and the run my drop duplicates code (based on this answer. However, this doesn't work:
data_category_range = df_a['zone'].unique()
data_category_range = data_category_range.tolist()
for i,value in enumerate(data_category_range):
data_category_range['zone_{}'.format(i)] = df_a[df_a['zone'] == value]
# de-duplicate
cols = ["zone","value"]
de_dup = df_a[cols].loc[(df_a[cols].shift() != df_a[cols]).any(axis=1)]
(This loop is within another loop which will iterate across dataframes with different 'zone' values, so variable needs to be dynamic - open to alternatives as I understand this isnt best practice).
Thanks!