I have 2 gzipped csv files IMFBOP2017_1.csv.gz
and IMFBOP2017_2.csv.gz
with same columns in both file i.e "Location, Indicator, Measure, Unit, Frequency, Date"
.
Total rows 60 millions+
I want to compare both file & display rows of IMFBOP2017_1
that are not present in IMFBOP2017_2
.
My plan is to import both files to dataframes , add an extra column "compare" to both dataframes and update it by all fields merge like
Location|Indicator|Measure|Unit|Frequence|Date and do NOT IN operation.
I think this is a costly process, is there any simple solution for this?