I am comparing two excel files by searching a column value in other file and if that its not present in other file, It will write that whole row to text file.
My excels files are very large, They contain about 2,90,000 rows
Here is what I have tried
import sys
import pandas as pd
orig_stdout = sys.stdout
f = open('out.txt', 'w')
sys.stdout = f`
df0 = pd.ExcelFile('1.xlsx').parse('Sheet1')
df1 = pd.ExcelFile('v2.xlsx').parse('Sheet1')
print (df0[~df0['initial_id'].isin(df1['initial_id'])])
sys.stdout = orig_stdout
f.close()
print('Done.')'
compare a value under initial_id column and if its not present in second excel file , print that whole row from first file to output file
Actual Result
21 EXCLAMATION MARK A1 INVERTED EXCLAMATION MARK
22 QUOTATION MARK A2 CENT SIGN
23 NUMBER SIGN A3 POUND SIGN
24 DOLLAR SIGN A4 CURRENCY SIGN
25 PERCENT SIGN A5 YEN SIGN
26 AMPERSAND A6 BROKEN BAR
27 APOSTROPHE A7 SECTION SIGN
... ... ... ...
3159 DIGIT NINE B9 SUPERSCRIPT ONE
3160 COLON BA MASCULINE ORDINAL INDICATOR
3161 SEMICOLON BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
3162 LESS-THAN SIGN BC VULGAR FRACTION ONE QUARTER
3163 EQUALS SIGN BD VULGAR FRACTION ONE HALF
Expected Result
Missing lines after 27 should also be written to file. If It consumes RAM to store, Part files will also work