Merge, sum and removing duplicates with pandas

Question

I have two different data frames with different sizes just like this:

df_web = (['Event Category', 'ID', 'Total Events', 
           'Unique Events', 'Event Value', 'Avg. Value'])

df_app = (['Event Category', 'ID', 'Total Events',
           'Unique Events', 'Event Value', 'Avg. Value']

I'm using pandas to try to merge them in a 'df_final', but I want to sum the values of 'Total Events' which have the same 'ID' , and in the end I would like to have a 'df_final' without duplicates in the ID.

I tried:

df_final_analysis = df_web.groupby(['Event Category', 'ID', 'Total Events', 
                                   'Unique Events', 'Event Value', 'Avg. Value'],
                                    as_index=False)['Total Events'].sum()

But it doesnt give me the result that I want.

For example:

df_web

  Video          A        10

  Video          B         5

  Video          C         1

  Video          F         1

  Video          G         1

  Video          H         1

For df_app:

  Video         A       15


  Video         D        3


  Video         C        1

For the df_final_analysis I want:

  Video         A       25

  Video         B        5

  Video         D        3

  Video         C        2

  Video         F        1

  Video         G        1

  Video         H        1

Is there a elegant way to do this?

@Thabra Really - the `df_*` variables aren't DataFrames at all, but lists; the second one isn't even delimited correctly. — Ami Tavory, Sep 02 '16 at 14:37
Thank you, but I did not understand what you say. Can you tell me a better way to do what I want? — Thabra, Sep 02 '16 at 14:48
@Thabra please check [how to make good examples in pandas](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). We need something we can reproduce to help. — IanS, Sep 02 '16 at 15:02

shivsn · Accepted Answer · 2016-09-02T15:10:09.773

0

Modified solution from your code using pd.concat:

In [46]: df
Out[46]: 
           0  1   2
    0  Video  A  10
    1  Video  B   5
    2  Video  C   1
    3  Video  F   1
    4  Video  G   1
    5  Video  H   1

In [47]: df1
Out[47]: 
           0  1   2
    0  Video  A  15
    1  Video  D   3
    2  Video  C   1

In[59]:pd.concat([df,df1]).groupby([0,1],as_index=False)[2].sum()
Out[59]: 
       0  1   2
0  Video  A  25
1  Video  B   5
2  Video  C   2
3  Video  D   3
4  Video  F   1
5  Video  G   1
6  Video  H   1

using pd.merge:

In [60]: pd.merge(df,df1,how='outer').groupby([0,1],as_index=False)[2].sum()
Out[60]: 
       0  1     2
0  Video  A  25.0
1  Video  B   5.0
2  Video  C   1.0
3  Video  D   3.0
4  Video  F   1.0
5  Video  G   1.0
6  Video  H   1.0

edited Sep 02 '16 at 15:10

answered Sep 02 '16 at 15:04

shivsn

7,680
1
26
33

Thank you very much! But I`m facing problems with this. None of the solution worked. I received Traceback (most recent call last): File "video_tempo_analise_julho.py", line 25, in df2 = pd.concat([df_web,df_app]).groupby([0,1],as_index=False)[2].sum() File "/home/tha/anaconda2/lib/python2.7/site-packages/pandas/core/generic.py", line 3778, in groupby **kwargs) ... KeyError: 0 – Thabra Sep 02 '16 at 15:26
0,1,2 are column names replace it with your column names which you have mentioned above. – shivsn Sep 02 '16 at 18:26
Yes, I noted it. But my final file still has duplicates... and it is does not sum the values with the same id – Thabra Sep 02 '16 at 18:44
try this `pd.merge(df,df1,how='outer').drop_duplicates().groupby([0,1],as_index=False)` – shivsn Sep 02 '16 at 18:46

Merge, sum and removing duplicates with pandas

1 Answers1