I am dealing with file-file and file-sql comparisons. Since the size of my data is large so I am forced to use chunksize of pandas dataframe. For testing purpose I have used the same data in both file-file as well as file-sql. When I do the file-file comparison all works out good. However when I use chunksize for read_sql_query, things works out fine for the first chunk, but I get following message when second chunk is being processed:
ValueError: Can only compare identically-labeled DataFrame objects
Error happens specifically at this code
ne_stacked = (src_df != tgt_df).stack()
I tried to get any difference in the columns but all look good:
print(src_df.columns)
Index(['firstname', 'lastname', 'account_num', 'salary', 'rental_income', 'int_yield', 'dividend', 'royalty', 'mortgage', 'car_loan', 'rent', 'other_expense', 'created_at', 'updated_at'], dtype='object')
print(tgt_df.columns)
Index(['firstname', 'lastname', 'account_num', 'salary', 'rental_income', 'int_yield', 'dividend', 'royalty', 'mortgage', 'car_loan', 'rent', 'other_expense', 'created_at', 'updated_at'], dtype='object')
Can you please help in figuring out what is going on.