0

I need to write a loop to check whether one hot encoding was done properly (meaning cases in which an invoice has a value: 1 in a wrong column or case in which certain invoices have multiple 1s for one variable etc ), need to have dataframe as an output showcasing the potential list of errors for each invoice/row.

Here i have these two dataframes:

Invoice_ID_raw:

Invoice ID Type of purchase Paid
1233 Remote CASH
4566 Paid upon arrival CARD
4458 Remote IN ADVANCE

Invoice_ID_after_one_hot

Invoice ID Type of purchase_Remote Type of purchase_Paid upon arrival Paid_CASH PAID_CARD PAID_IN ADVANCE
1233 1 0 1 0 0
4566 0 1 0 1 0
4458 1 0 0 0 1

Desired dataframe layout as an output of the loop

Invoice ID Type of purchase_correct_encoding? Paid_correct_encoding?
1233 Correct Correct
4566 Correct Correct
4458 Correct NOT CORRECT

Could you please help as I am still being a new joiner to Python. would greatly appreciate !

  • Please don’t post images of the data as we can’t test them. Instead, post a sample of the DataFrame(s) and expected output directly inside a code block. This allows us to easily reproduce your problem and help you. Otherwise, the probability of you getting any answer is low. Take the time to read [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) and [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and revise your question accordingly. – Rodalm Dec 14 '21 at 20:42
  • 1
    understood, will adjust. Thanks ! – icemansssss Dec 14 '21 at 20:43
  • I see that you've updated, much better now, thanks! However, the expected output is still not clear. I think it's better if you elaborate an example with a wrong one hot encoding result, and then share the expected output i.e. the 'dataframe showcasing the potential list of errors for each invoice/row.' – Rodalm Dec 14 '21 at 21:46

1 Answers1

0

You can use pd.getdummies() :

pd.get_dummies(df[['Type of purchase', 'Paid']]).drop(columns=['Paid_IN ADVANCE'])
AfterFray
  • 1,751
  • 3
  • 17
  • 22
  • Thank you but what I need something different as the enconding was done using pd.get_dummies, WhatI try to write is a loop to check whether pd.get_dummies has done it correclty, meaning 1 and 0 are in proper dummy columns for each row (Invoice). – icemansssss Dec 15 '21 at 08:37