I am having two pandas data frames say df1 and df2. df1 has 6 variables and df2 has 5 variables. and first variable in both the data frames are in string format and reaming are in int format.
i want to identify the mismatched records in both data frames by using first 3 columns of both data frames and have to exclude them from df1 dataframe.
for that i tried the following code but it throws Nan values for me, if i drop the Nan values then required data will be deleted.
input data:-
**df1:-** **df2:-**
x1 x2 x3 x4 x5 x6 x1 x2 x3 x4 x5
SM 1 1 2 3 3 RK 2 4 3 4
RK 2 2 3 4 5 SM 1 1 3 3
NBR 1 2 2 5 6 NB 1 2 3 2
CBK 2 5 6 7 8 VSB 5 6 3 2
VSB 5 6 4 2 1 CB 2 6 4 1
SB 6 2 3 2 1 SB 6 2 4 1
expected_out_put:-
x1 x2 x3 x4 x5 x6
RK 2 2 3 4 5
CBK 2 5 6 7 8
NBR 1 2 2 5 6
syntax:-
data_out=df1[~df1['x1','x2','x3'].isin(df2['x1','x2','x3'])]
data_out=data_out.dropna()
please anyone can help me to tackle this.
Thanks in advance.