delete rows based on if consecutive rows are similar - Python

Question

I have this data frame and want to remove rows based on this set of rules. If consecutive rows have the same 'area' and 'local' value and the 'group_name' is different then I want to remove the first row:

df = pd.DataFrame()
df['time']  = pd.date_range("2018-01-01", freq = "s", periods = 10)
df['area'] = [1,1,1,2,2,2,3,3,4,4]
df['local'] = [1,1,1,1,2,2,2,2,2,2]
df['group_name'] = [1,1,2,2,2,3,3,3,4,4] 
df['value'] = [1,4,3,2,5,6,2,1,7,8]

The image above shows the table and I would want to remove row 1 and 4.

I have tried using duplicated() on the subset of Area, Local and Group Name, but this not keep all the unique ones that I need

Please help me out!

Hi, what is it that you tried for this as code? That way you help people here help you better. As it stands, it seems like you're making other people do all the job, which is kind of rude, right? — , Jan 14 '22 at 14:15
Does this answer your question? [Remove duplicates from dataframe, based on two columns A,B, keeping row with max value in another column C](https://stackoverflow.com/questions/32093829/remove-duplicates-from-dataframe-based-on-two-columns-a-b-keeping-row-with-max) — JonSG, Jan 14 '22 at 14:16
@Neither Apologies. I have tried some code but got lost. I Will upload it with the question — user3234242, Jan 14 '22 at 14:25
@JonSG Yes that is almost there but I still want to keep for example row 8 and 9 even though they are duplicates? Please let me know if that is not clear — user3234242, Jan 14 '22 at 14:26
No need to apologize to me at all :) I retracted the downvote. I came up with this: we form 2 masks that you describe: `m1 = df[["area", "local"]].diff().eq(0).all(axis=1).shift(-1)` and `m2 = df["group_name"].diff().ne(0).shift(-1)`. When they are *both* `True`, that's a row we'd like to drop; so `new_df = df[~(m1 & m2)].copy()` (`copy` is to avoid possible SettingWithCopyWarning afterwards). But I'm not sure if this passes all the cases you have in mind... — , Jan 14 '22 at 14:52
`m1` is detecting where "area" and "local" columns have the same values as the *next* row: take the difference (`diff`), see if it's equal to 0 (i.e., same) for both columns (`eq` and `all`) and shift up by 1 to have "the same as the next row". `m2` is finding where "group_name" is different from the next row: take the difference again, this time check if *not* equal to 0 since we look if it differs, and shift again as before. — , Jan 14 '22 at 14:55

score 0 · Answer 1 · answered Jan 14 '22 at 14:28

You can do this by writing a number of if statements like this:

for i in range(len(df)-1):
    if df.loc[i]['local'] == df.loc[i+1]['local']:
        if df.loc[i]['area'] == df.loc[i+1]['area']:
            if df.loc[i]['group_name'] != df.loc[i+1]['group_name']:
                df.drop(i, inplace=True)

delete rows based on if consecutive rows are similar - Python

1 Answers1