Loop for one hot encoding quality checks for PANDAS dataframes

Question

I need to write a loop to check whether one hot encoding was done properly (meaning cases in which an invoice has a value: 1 in a wrong column or case in which certain invoices have multiple 1s for one variable etc ), need to have dataframe as an output showcasing the potential list of errors for each invoice/row.

Here i have these two dataframes:

Invoice_ID_raw:

Invoice ID	Type of purchase	Paid
1233	Remote	CASH
4566	Paid upon arrival	CARD
4458	Remote	IN ADVANCE

Invoice_ID_after_one_hot

Invoice ID	Type of purchase_Remote	Type of purchase_Paid upon arrival	Paid_CASH	PAID_CARD	PAID_IN ADVANCE
1233	1	0	1	0	0
4566	0	1	0	1	0
4458	1	0	0	0	1

Desired dataframe layout as an output of the loop

Invoice ID	Type of purchase_correct_encoding?	Paid_correct_encoding?
1233	Correct	Correct
4566	Correct	Correct
4458	Correct	NOT CORRECT

Could you please help as I am still being a new joiner to Python. would greatly appreciate !

Please don’t post images of the data as we can’t test them. Instead, post a sample of the DataFrame(s) and expected output directly inside a code block. This allows us to easily reproduce your problem and help you. Otherwise, the probability of you getting any answer is low. Take the time to read [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) and [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and revise your question accordingly. — Rodalm, Dec 14 '21 at 20:42
I see that you've updated, much better now, thanks! However, the expected output is still not clear. I think it's better if you elaborate an example with a wrong one hot encoding result, and then share the expected output i.e. the 'dataframe showcasing the potential list of errors for each invoice/row.' — Rodalm, Dec 14 '21 at 21:46

score 0 · Answer 1 · answered Dec 15 '21 at 01:15

0

You can use pd.getdummies() :

pd.get_dummies(df[['Type of purchase', 'Paid']]).drop(columns=['Paid_IN ADVANCE'])

answered Dec 15 '21 at 01:15

AfterFray

1,751
3
17
22

Thank you but what I need something different as the enconding was done using pd.get_dummies, WhatI try to write is a loop to check whether pd.get_dummies has done it correclty, meaning 1 and 0 are in proper dummy columns for each row (Invoice). – icemansssss Dec 15 '21 at 08:37

Loop for one hot encoding quality checks for PANDAS dataframes

1 Answers1