Aim is to detect fraud from this dataset.
I have two dataframes with columns as:
DF1[customerEmail, customerphone, customerdevice,customeripadd,NoOftransactions,Fraud] etc (168,11)
DF2[customerEmail,transactionid, payment methods,orderstatus] etc (623,11)
The customerEmail column is common in both the dataframes so it makes sense to merge tables on customerEmail.
The problem is that I have repeating customerEmail in DF2 with no reference in DF1. So when I merge using:
: DF3 = pd.merge(DF1, DF2, on='customerEmail')
the total size of rows and columns is (819,18) with repeating email ID having misleading data.
I want it to match using customerEmail from DF1 so my final dataframe DF3 should be somewhere equal to DF1.
Here's a link to the data for you to look at. Cheers https://www.kaggle.com/aryanrastogi7767/ecommerce-fraud-data