I have come very close to solving this, but the final result is always missing a some of the information.
I want to take every reference number that has the code ESBK022, ESBK023, or ESBK024 from the lookup file, and then remove every instance of that reference number from the mainfile.
In the main file, there are no duplicate reference numbers as the display code only appears in the lookup file.
Here is what I want to accomplish:
Main File:
REFERENCE NUMBER DISPLAY CODE
1922 ESBK022
1923 ESBK023
1924 ESBK022
1925 ESBK024
1922 IATL123
1922 AUXT876
Lookup file:
REFERENCE NUMBER
1922
1923
1924
1925
1926
1927
Final Result:
REFERENCE NUMBER
1926
1927
Here is my code:
import pandas as pd
excel1 = 'desktop/lookup.xlsx'
excel2 = 'desktop/mainfile.xlsx'
#Delete top 3 blank rows of main file and last two summary rows
df1 = pd.read_excel(excel1, header=3)
df1 = (df1[:-2])
df2 = pd.read_excel(excel2)
#finds all reference numbers that have the ESBK code
has_code = df1[(df1['DISPLAY CODE'] == 'ESBK022') | (df1['DISPLAY CODE'] == 'ESBK023')
| (df1['DISPLAY CODE'] == 'ESBK024')]
I've played around a lot with this part. Right now this just results in the table showing only reference numbers that have an ESBK code, and not the other way around.
#copies merged table layout to setup concatenation
merge_codes = pd.merge(has_code, df2, on="REFERENCE NUMBER")
merge_codes.to_excel('Desktop/ExcelTest.xlsx',sheet_name='DATA', index=False)