Merge and drop multiple rows based on column(s) value using Python

Question

I have a dataframe like:

Task  ID  Value  Sno
 A    A1    5     1
 A    A1    2     2
 B    A1    4     3
 A    A1    1     4
 A    B1    10    5
 C    B1    3     6
 D    B1    5     7
 D    B1    2     8
 D    B1    12    9
 E    C1    25    10

And the expected output should look like

Task  ID  Value  Sno
 A    A1    7    1,2
 B    A1    4     3
 A    A1    1     4
 A    B1    10    5
 C    B1    3     6
 D    B1    19   7,8,9
 E    C1    25    10

So, for each record where ID is same, the same consecutive 'Task' has to be merged and the 'Value' to be added. My approach is listed below but with that, it is not considering the cases where the same 'Task' occurs more than twice (the 'D' task in the example). Also, not able to drop the rows after merging, don't know why.

    for j in range(len(df)-1):
    if(df.loc[j,'Task']==df.loc[j+1, 'Task'] and df.loc[j,'ID']==df.loc[j+1, 'ID']):
        df.loc[j, 'Value'] = df.loc[j, 'Value'] + df.loc[j+1, 'Value']
        df.loc[j, 'ID'] = str(df.loc[j, 'ID']) + "," + str(df.loc[j+1, 'ID'])
        df.drop([j+1])

Does this answer your question? [How to group dataframe rows into list in pandas groupby](https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby) — roganjosh, May 16 '21 at 13:50
@roganjosh I have seen the post and similar ones too, but in those, grouping is done by all instances, what I want is only for consecutive records like mentioned — Xavier, May 16 '21 at 14:07

score 1 · Accepted Answer · answered May 16 '21 at 14:21

1

Pretty similar to a groupby and aggregate join. However you have to first create a helper column to identify the consecutive groups:

u = df[['Task','ID']]
g = u.ne(u.shift()).any(1).cumsum()

d = {"Value":"sum","Sno":lambda x: ','.join(x.astype(str))}
#d = {"Value":"sum","Sno":lambda x: ','.join([str(i) for i in x])}

out = df.groupby(['Task','ID',g]).agg(d).droplevel(-1).reset_index()

print(out)

  Task  ID  Value    Sno
0    A  A1      7    1,2
1    A  A1      1      4
2    A  B1     10      5
3    B  A1      4      3
4    C  B1      3      6
5    D  B1     19  7,8,9
6    E  C1     25     10

answered May 16 '21 at 14:21

anky

74,114
11
41
70

this answers the question perfectly, but is there a way the order of the data can be preserved as it is of importance, current;y the output is sorted based on "Task". Also, how to include other columns apart from those mentioned (their values are the same for the rows being merged)? – Xavier May 16 '21 at 14:56
@Xavier you can pass `sort=False` in groupby `df.groupby(['Task','ID',g],sort=False)....` – anky May 16 '21 at 14:57
okay got it, and can you guide on how to include other columns in the resultset as well – Xavier May 16 '21 at 15:29
@Xavier if they are not a grouper column, you would have to pass some sort of aggregate function in the dictionary `{...., 'Colx','first'}` for example – anky May 16 '21 at 15:57

Merge and drop multiple rows based on column(s) value using Python

1 Answers1