Supposing I have a df like the following,
col1 | type | date_1 | date_2 | date_3 |.... | date_n
ab | A | -10 | | -10
ab | B | 100 | 99 | -12
cd | A | 0 | -25 | 6
cd | B | -1 | 8 | -34
ab | A | 98 | -9 | 0
ab | B | -7 | -2 | 0
First step is to remove all positive numbers including 0
Now the df should look like,
col1 | type | date_1 | date_2 | date_3 | .... | date_n
ab | A | -10 | | -10 |
ab | B | | | -12 |
cd | A | | -25 | |
cd | B | -1 | | -34 |
ab | A | | -9 | |
ab | B | -7 | -2 | |
Second step is to compare the numbers for each 'date' col as per 'type' A and B,
If the 'type' A row has a negative number and 'type' B is blank, then remove the negative number, of 'date' col, of 'type' A
If the 'type' B row has a negative number and 'type' A is blank, then do nothing
If both types are blank do nothing
After this step, the df should look like this,
col1 | type | date_1 | date_2 | date_3 | .... | date_n
ab | A | | | -10 |
ab | B | | | -12 |
cd | A | | | |
cd | B | -1 | | -34 |
ab | A | | -9 | |
ab | B | -7 | -2 | |
Final step,
If both types are negative for the current, for each set of col1 (ab,cd,ab), check the left-hand-side value of same Ath and Bth of the same row,
1) If both types A and B values are blank, then remove the remove the negative number of current row 'type' A and keep the -ve number of 'type' B 2) If either of the types are blank, then remove the negative of the current row 'type' B and keep the -ve number of 'type' A
Finally, the final_df should look like this,
col1 | type | date_1 | date_2 | date_3 | .... | date_n
ab | A | | | |
ab | B | | | -12 |
cd | A | | | |
cd | B | -1 | | -34 |
ab | A | | -9 | |
ab | B | -7 | | |
For the final step, the comparison should start from the 'date_2'.
What would be the best way to solve this problem? Any help would be greatly appreciated!
Note: I cannot use the column headers (the date ones) to manipulate data because they will keep changing.
Test Data:
{'column1': ['CT', 'CT', 'NBB', 'NBB', 'CT', 'CT', 'NBB', 'NBB', 'HHH', 'HHH', 'TP1', 'TP1', 'TPR', 'TPR', 'PP1', 'PP1', 'PP1', 'PP1'], 'column2': ['POUPOU', 'POUPOU', 'PRPRP', 'PRPRP', 'STDD', 'STDD', 'STDD', 'STDD', 'STEVT', 'STEVT', 'SYSYS', 'SYSYS', 'SYSYS', 'SYSYS', 'SHW', 'SHW', 'JV', 'JV'], 'column3': ['V', 'CV', 'V', 'CV', 'V', 'CV', 'V', 'CV', 'V', 'CV', 'V', 'CV', 'V', 'CV', 'V', 'CV', 'V', 'CV'], 'column4': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'], 'column5': [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], 'column6': ['BBB', 'BBB', 'CCC', 'CCC', 'BBB', 'BBB', 'BBB', 'BBB', 'VVV', 'VVV', 'CHCH', 'CHCH', 'CHCH', 'CHCH', 'CCC', 'CCC', 'CHCH', 'CHCH'], 'column7': ['Apr-21', 'Apr-21', 'Apr-21', 'Apr-21', 'Apr-21', 'Apr-21', 'Apr-21', 'Apr-21', 'Mar-21', 'Mar-21', 'Mar-21', 'Mar-21', 'Mar-21', 'Mar-21', 'Apr-21', 'Apr-21', 'Mar-21', 'Mar-21'], 'Feb-21': [11655, 0, 0, 0, 121117, 0, 14948, 0, 0, 0, 0, 0, 0, 0, 1838, 0, 0, 0], 'Mar-21': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -16474.0, -16474.0, 7000.0, 7000.0, -19946.0, -19946.0, 16084.44444444444, 0.0, 0.0, 0.0], 'Apr-21': [104815.0, 104815.0, 17949.0, 17949.0, 96132.0, 96132.0, 0.0, 0.0, -17001.0, -33475.0, -878.0, 6122.0, 8398.0, -11548.0, -5297.073170731703, -5297.073170731703, -282.0, -282.0], 'May-21': [78260.0, 183075.0, 42557.0, 60506.0, -15265.0, 80867.0, -18.0, -18.0, 21084.0, -12391.0, -1831.0, 4291.0, 2862.0, -8686.0, 5261.25, -35.8231707317027, -369.0, -651.0], 'Jun-21': [-52480.0, 130595.0, -13258.0, 47248.0, -35577.0, 45290.0, 2434.0, 2416.0, 31147.0, 18756.0, -4310.0, -19.0, -4750.0, -13436.0, -92.0, -127.8231707317027, -280.0, -931.0], 'Jul-21': [-174544.0, -43949.0, -38127.0, 9121.0, -124986.0, -79696.0, -9707.0, -7291.0, 13577.0, 32333.0, 0.0, -19.0, -15746.0, -29182.0, 93.0, -34.8231707317027, -319.0, -1250.0], 'Aug-21': [35498.0, -8451.0, -37094.0, -27973.0, 79021.0, -675.0, -1423.0, -8714.0, 32168.0, 64501.0, 0.0, -19.0, 18702.0, -10480.0, 4347.634146341465, 4312.810975609762, -341.0, -1591.0], 'Sep-21': [44195.0, 35744.0, 2039.0, -25934.0, 70959.0, 70284.0, 2816.0, -5898.0, 38359.0, 102860.0, 0.0, -19.0, 18119.0, 7639.0, 5302.222222222219, 9615.033197831981, 0.0, -1591.0], 'Oct-21': [-13163.0, 22581.0, -4773.0, -30707.0, 205080.0, 275364.0, -709.0, -6607.0, -1397.0, 101463.0, 0.0, -19.0, 0.0, 7639.0, -34.0, 9581.033197831981, 0.0, -1591.0]}
Expected output:
{'column1': ['CT', 'CT', 'NBB', 'NBB', 'CT', 'CT', 'NBB', 'NBB', 'HHH', 'HHH', 'TP1', 'TP1', 'TPR', 'TPR', 'PP1', 'PP1', 'PP1', 'PP1'], 'column2': ['POUPOU', 'POUPOU', 'PRPRP', 'PRPRP', 'STDD', 'STDD', 'STDD', 'STDD', 'STEVT', 'STEVT', 'SYSYS', 'SYSYS', 'SYSYS', 'SYSYS', 'SHW', 'SHW', 'JV', 'JV'], 'column3': ['V', 'CV', 'V', 'CV', 'V', 'CV', 'V', 'CV', 'V', 'CV', 'V', 'CV', 'V', 'CV', 'V', 'CV', 'V', 'CV'], 'column4': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'], 'column5': [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], 'column6': ['BBB', 'BBB', 'CCC', 'CCC', 'BBB', 'BBB', 'BBB', 'BBB', 'VVV', 'VVV', 'CHCH', 'CHCH', 'CHCH', 'CHCH', 'CCC', 'CCC', 'CHCH', 'CHCH'], 'column7': ['Apr-21', 'Apr-21', 'Apr-21', 'Apr-21', 'Apr-21', 'Apr-21', 'Apr-21', 'Apr-21', 'Mar-21', 'Mar-21', 'Mar-21', 'Mar-21', 'Mar-21', 'Mar-21', 'Apr-21', 'Apr-21', 'Mar-21', 'Mar-21'], 'Feb-21': [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], 'Mar-21': [nan, nan, nan, nan, nan, nan, nan, nan, nan, -16474.0, nan, nan, nan, -19946.0, nan, nan, nan, nan], 'Apr-21': [nan, nan, nan, nan, nan, nan, nan, nan, -17001.0, nan, nan, nan, nan, -11548.0, nan, -5297.073170731703, nan, -282.0], 'May-21': [nan, nan, nan, nan, nan, nan, nan, -18.0, nan, -12391.0, nan, nan, nan, -8686.0, nan, -35.8231707317027, -369.0, nan], 'Jun-21': [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, -19.0, -4750.0, nan, -92.0, nan, -280.0, nan], 'Jul-21': [nan, -43949.0, nan, nan, nan, -79696.0, nan, -7291.0, nan, nan, nan, -19.0, -15746.0, nan, nan, -34.8231707317027, -319.0, nan], 'Aug-21': [nan, -8451.0, nan, -27973.0, nan, -675.0, -1423.0, nan, nan, nan, nan, -19.0, nan, -10480.0, nan, nan, -341.0, nan], 'Sep-21': [nan, nan, nan, -25934.0, nan, nan, nan, -5898.0, nan, nan, nan, -19.0, nan, nan, nan, nan, nan, -1591.0], 'Oct-21': [nan, nan, -4773.0, nan, nan, nan, -709.0, nan, nan, nan, nan, -19.0, nan, nan, nan, nan, nan, -1591.0]}