0

I have two different data frames with different sizes just like this:

df_web = (['Event Category', 'ID', 'Total Events', 
           'Unique Events', 'Event Value', 'Avg. Value'])

df_app = (['Event Category', 'ID', 'Total Events',
           'Unique Events', 'Event Value', 'Avg. Value']

I'm using pandas to try to merge them in a 'df_final', but I want to sum the values of 'Total Events' which have the same 'ID' , and in the end I would like to have a 'df_final' without duplicates in the ID.

I tried:

df_final_analysis = df_web.groupby(['Event Category', 'ID', 'Total Events', 
                                   'Unique Events', 'Event Value', 'Avg. Value'],
                                    as_index=False)['Total Events'].sum()

But it doesnt give me the result that I want.

For example:

df_web

  Video          A        10

  Video          B         5

  Video          C         1

  Video          F         1

  Video          G         1

  Video          H         1

For df_app:

  Video         A       15


  Video         D        3


  Video         C        1

For the df_final_analysis I want:

  Video         A       25

  Video         B        5

  Video         D        3

  Video         C        2

  Video         F        1

  Video         G        1

  Video         H        1

Is there a elegant way to do this?

shivsn
  • 7,680
  • 1
  • 26
  • 33
Thabra
  • 337
  • 2
  • 9
  • Please provide a [MCVE]. – IanS Sep 02 '16 at 14:00
  • I tried to improve the question with an example. Thank you. – Thabra Sep 02 '16 at 14:21
  • @Thabra Really - the `df_*` variables aren't DataFrames at all, but lists; the second one isn't even delimited correctly. – Ami Tavory Sep 02 '16 at 14:37
  • Thank you, but I did not understand what you say. Can you tell me a better way to do what I want? – Thabra Sep 02 '16 at 14:48
  • @Thabra please check [how to make good examples in pandas](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). We need something we can reproduce to help. – IanS Sep 02 '16 at 15:02

1 Answers1

0

Modified solution from your code using pd.concat:

In [46]: df
Out[46]: 
           0  1   2
    0  Video  A  10
    1  Video  B   5
    2  Video  C   1
    3  Video  F   1
    4  Video  G   1
    5  Video  H   1

In [47]: df1
Out[47]: 
           0  1   2
    0  Video  A  15
    1  Video  D   3
    2  Video  C   1

In[59]:pd.concat([df,df1]).groupby([0,1],as_index=False)[2].sum()
Out[59]: 
       0  1   2
0  Video  A  25
1  Video  B   5
2  Video  C   2
3  Video  D   3
4  Video  F   1
5  Video  G   1
6  Video  H   1

using pd.merge:

In [60]: pd.merge(df,df1,how='outer').groupby([0,1],as_index=False)[2].sum()
Out[60]: 
       0  1     2
0  Video  A  25.0
1  Video  B   5.0
2  Video  C   1.0
3  Video  D   3.0
4  Video  F   1.0
5  Video  G   1.0
6  Video  H   1.0
shivsn
  • 7,680
  • 1
  • 26
  • 33
  • Thank you very much! But I`m facing problems with this. None of the solution worked. I received Traceback (most recent call last): File "video_tempo_analise_julho.py", line 25, in df2 = pd.concat([df_web,df_app]).groupby([0,1],as_index=False)[2].sum() File "/home/tha/anaconda2/lib/python2.7/site-packages/pandas/core/generic.py", line 3778, in groupby **kwargs) ... KeyError: 0 – Thabra Sep 02 '16 at 15:26
  • 0,1,2 are column names replace it with your column names which you have mentioned above. – shivsn Sep 02 '16 at 18:26
  • Yes, I noted it. But my final file still has duplicates... and it is does not sum the values with the same id – Thabra Sep 02 '16 at 18:44
  • try this `pd.merge(df,df1,how='outer').drop_duplicates().groupby([0,1],as_index=False)` – shivsn Sep 02 '16 at 18:46