Getting uncommon records from two dataframes

Question

I am trying to get the difference in the two dataframe. So, I want to delete the number of records which are different and make separate dataframes from it.I performed as explained here Comparing two dataframes and getting the differences:

train_abusive=pd.read_csv('train_abusive.csv',low_memory=False)
train_non_abusive=pd.read_csv('train_non_abusive.csv',low_memory=False)
print len(train_abusive),len(train_non_abusive)

val_abusive=train_abusive.sample(frac=0.1)
val_non_abusive=train_non_abusive.sample(frac=0.2)

train_abusive=pd.concat([val_abusive,train_abusive],ignore_index=True)
train_abusive=train_abusive.drop_duplicates(keep=False)

train_non_abusive=pd.concat([val_non_abusive,train_non_abusive],ignore_index=True)
train_non_abusive=train_non_abusive.drop_duplicates(keep=False)

print len(train_abusive),len(train_non_abusive)

It gives the following output:

50000 200000
44596 155010

But the math doesn't work out. I am not sure why.

flmlopes · Answer 1 · 2018-10-13T00:37:26.533

0

Editted: if you only want to compare the 2 dataframes you can use assert.

train_abusive=pd.read_csv('train_abusive.csv',low_memory=False)
train_non_abusive=pd.read_csv('train_non_abusive.csv',low_memory=False)

from pandas.util.testing import assert_frame_equal
assert_frame_equal(train_abusive, train_non_abusive)

Also i saw an answer by Tom Chapin in another post that might interest you.

def get_different_rows(train_abusive, train_non_abusive):
    """Returns just the rows from the new dataframe that differ from the source dataframe"""
    merged_df = train_abusive.merge(train_non_abusive, indicator=True, how='outer')
    changed_rows_df = merged_df[merged_df['_merge'] == 'right_only']
    return changed_rows_df.drop('_merge', axis=1)

edited Oct 13 '18 at 00:37

answered Oct 12 '18 at 02:50

flmlopes

62
1
10

I need to randomly sample records from dataframe and delete that from original dataframe – VIVEK Oct 12 '18 at 22:38
you can try to use assert to compare the 2 dataframes – flmlopes Oct 13 '18 at 00:30

Getting uncommon records from two dataframes

1 Answers1