I need to write a loop to check whether one hot encoding was done properly (meaning cases in which an invoice has a value: 1 in a wrong column or case in which certain invoices have multiple 1s for one variable etc ), need to have dataframe as an output showcasing the potential list of errors for each invoice/row.
Here i have these two dataframes:
Invoice_ID_raw:
Invoice ID | Type of purchase | Paid |
---|---|---|
1233 | Remote | CASH |
4566 | Paid upon arrival | CARD |
4458 | Remote | IN ADVANCE |
Invoice_ID_after_one_hot
Invoice ID | Type of purchase_Remote | Type of purchase_Paid upon arrival | Paid_CASH | PAID_CARD | PAID_IN ADVANCE |
---|---|---|---|---|---|
1233 | 1 | 0 | 1 | 0 | 0 |
4566 | 0 | 1 | 0 | 1 | 0 |
4458 | 1 | 0 | 0 | 0 | 1 |
Desired dataframe layout as an output of the loop
Invoice ID | Type of purchase_correct_encoding? | Paid_correct_encoding? |
---|---|---|
1233 | Correct | Correct |
4566 | Correct | Correct |
4458 | Correct | NOT CORRECT |
Could you please help as I am still being a new joiner to Python. would greatly appreciate !