I have some time stamped financial data as shown below:
Sample Data:
transaction_type transaction_announced_date transaction_size_USDmm target_company_name ------------------ ---------------------------- ------------------------ --------------------- B 11/12/2017 8000 Company A A 4/19/2017 NULL Company A A 2/12/2016 200 Company A A 5/24/2016 NULL Company A A 6/1/2016 3500 Company A B 7/7/2016 NULL Company A A 9/22/2016 30 Company A A 12/4/2014 2800 Company A A 1/16/2015 1691 Company B A 3/22/2015 NULL Company B B 7/31/2015 1000 Company C A 8/19/2015 NULL Company C A 8/25/2015 NULL Company C
For companies that have had a transaction B, I want to find the sum of that company's prior transactions A (based on announced dates) and add that value in a new column called 'sum_prior_trans_A'.
Expected Results:
transaction_type transaction_announced_date transaction_size_USDmm target_company_name sum_prior_trans_A ------------------ ---------------------------- ------------------------ --------------------- ------------------- B 11/12/2017 8000 Company A 6530 B 7/7/2016 NULL Company A 2830 B 7/31/2015 1000 Company C NaN
Current approach:
#input dataframe
trans_data
#add a new column that is the sum of all prior transactions A.
#Will later drop all transactions A rows to be only left with transactions B as desired.
trans_data['sum_previous_private_placements'] = trans_data.groupby(['target_company_name', 'transaction_type', 'transaction_announced_date']).filter(lambda row: (trans_data['target_company_name'] == row['target_company_name']) & (trans_data['transaction_announced_date'] == row['transaction_announced_date']) & (trans_data['transaction_type'] == 'A'))['transaction_size_USDmm'].sum()
I am receiving the following error:
ValueError: Can only compare identically-labeled Series objects
How can I find the sum of prior transactions A for each row (company) and add that value in a new column called 'sum_prior_trans_A' without running into misaligned Series object error?