For a DataFrame:
>>> df = DataFrame([['2021-03-31', 'A0019', '990RT', 'OFFSET', '0.10'],['2021-03-31', 'A1019', '990CT', 'MARK', '0.10'],['2021-03-31', 'A0019', '990RT', 'M
ARK', '100'],['2021-03-31', 'A0019', '990RT', 'OFFSET', '0.70'],['2021-03-31', 'A0029', '990CT', 'OFFSET', '1.10'],['2021-03-31', 'A0029', '990CT', 'MARK',
'9.10'],['2021-03-31', 'A0019', '990QT', 'MARK', '99.10'], ['2021-03-31', 'C0019', '990QT', 'OFFSET', '1'], ['2021-03-31', 'C0019', '990QT', 'GHTC', '5'],
['2021-03-31', 'C0019', '990QT', 'OFFSET', '15']], columns=['DATE','A_ID','R_ID','TYPE','I_VAL'] )
>>> df
DATE A_ID R_ID TYPE I_VAL
0 2021-03-31 A0019 990RT OFFSET 0.10
1 2021-03-31 A1019 990CT MARK 0.10
2 2021-03-31 A0019 990RT MARK 100
3 2021-03-31 A0019 990RT OFFSET 0.70
4 2021-03-31 A0029 990CT OFFSET 1.10
5 2021-03-31 A0029 990CT MARK 9.10
6 2021-03-31 A0019 990QT MARK 99.10
7 2021-03-31 C0019 990QT OFFSET 1
8 2021-03-31 C0019 990QT GHTC 5
9 2021-03-31 C0019 990QT OFFSET 15
Each NON OFFSET
(e.g. MARK
, GHTC
) row uniquely matches to ZERO or more OFFSET
rows based on the combination of DATE, A_ID, R_ID
. That is, there is a ONE-TO-MANY relation between NON OFFSET
(e.g. MARK
) to OFFSET
rows.
I need to complete an operation in two steps:
- Aggregate the values of rows if the value
DATE, A_ID, R_ID
is same. Put the aggregated value as the value ofI_VAL
in the NONOFFSET
row. - Remove rows with
TYPE
OFFSET
.
The resultant DataFrame is:
# The rows with TYPE OFFSET are removed from resulting df.
# Keeping the OFFSET rows for explaining aggregation
# 0, 1, 2, 3, etc. are the indexes (row number) of the rows
DATE A_ID R_ID TYPE I_VAL
0 2021-03-31 A0019 990RT OFFSET 0.10
1 2021-03-31 A1019 990CT MARK 0.10 # no update, condition not met
2 2021-03-31 A0019 990RT MARK 100.80 # updated with sum of 0, self, and 3
3 2021-03-31 A0019 990RT OFFSET 0.70
4 2021-03-31 A0029 990CT OFFSET 1.10
5 2021-03-31 A0029 990CT MARK 10.20 # updated with sum of own value and 4
6 2021-03-31 A0019 990QT MARK 99.10 # no update, condition not met
7 2021-03-31 C0019 990QT OFFSET 1
8 2021-03-31 C0019 990QT GHTC 21 # updated with sum of self, 7, and 9
9 2021-03-31 C0019 990QT OFFSET 15
For step 2, I can do:
filtered_df = df[df.TYPE != 'OFFSET']
But, I couldn't figure out how would I aggregate the values? This post discusses a similar problem, but I couldn't modify it for my requirement.