2

Right now I have 2 dataframes. One with donor information and one with fundraiser information. Ideally what I want to do is for each donor sum up their donations and store it in the fundraiser dataframe. Problems are that it is possible to have a fundraiser in multiple events (so need to use the id and event as the key) and not all fundraisers actually collect anything. I've figured out how to groupby the donation dataframe to calculate the amount raised by the fundraisers that collected anything, but I have no idea how to then get that information over to the fundraiser dataframe :(

import pandas as pd
Donors = pd.DataFrame({"event": pd.Series([1,1,1,1,2,2]), "ID": pd.Series(['a','a','b','c','a','d']), "amount": ([1,2,3,4,5,6])})
fundraisers = pd.DataFrame({"event": pd.Series([1,1,1,2,2,1]), "ID": pd.Series(['a','b','c','a','d','e'])})

foo = Donors.groupby(["event", "ID"])["amount"].sum().reset_index()

ideally I want the fundraiser frame to look like:

event  |    id   | amount raised
--------------------------
1      |    a    |  3
1      |    b    |  3
1      |    c    |  4
1      |    e    |  0
2      |    a    |  5
2      |    d    |  6
Wizuriel
  • 3,617
  • 4
  • 21
  • 26

2 Answers2

3

Do an outer join:

In [15]: pd.merge(foo,fundraisers,how='outer').fillna(0)
Out[15]:
   event ID  amount
0      1  a       3
1      1  b       3
2      1  c       4
3      2  a       5
4      2  d       6
5      1  e       0

If you need the DataFrame to be sorted by the 'event' column then you can do

In [16]: pd.merge(foo,fundraisers,how='outer').fillna(0).sort('event')
Out[16]:
   event ID  amount
0      1  a       3
1      1  b       3
2      1  c       4
5      1  e       0
3      2  a       5
4      2  d       6

If you have different column names that you want to merge on, in this case let's say that 'ID' in Donors should be 'fundraiser ID' you can do

In [42]: merge(foo, fundraisers, left_on=['fundraiser ID', 'event'], right_on=['ID', 'event'], how='outer')
Out[42]:
   event fundraiser ID  amount ID
0      1             a       3  a
1      1             b       3  b
2      1             c       4  c
3      2             a       5  a
4      2             d       6  d
5      1           NaN     NaN  e
Phillip Cloud
  • 24,919
  • 11
  • 68
  • 88
1
>>> indexed = fundraisers.set_index(['ID', 'event'])
>>> indexed['amount'] = Donors.groupby(['ID', 'event'])['amount'].sum()
>>> indexed
          amount
ID event        
a  1           3
b  1           3
c  1           4
a  2           5
d  2           6
e  1         NaN
Viktor Kerkez
  • 45,070
  • 12
  • 104
  • 85