2

I have come very close to solving this, but the final result is always missing a some of the information.

I want to take every reference number that has the code ESBK022, ESBK023, or ESBK024 from the lookup file, and then remove every instance of that reference number from the mainfile.

In the main file, there are no duplicate reference numbers as the display code only appears in the lookup file.

Here is what I want to accomplish:

Main File:

REFERENCE NUMBER     DISPLAY CODE
      1922              ESBK022
      1923              ESBK023
      1924              ESBK022
      1925              ESBK024
      1922              IATL123
      1922              AUXT876

Lookup file:

REFERENCE NUMBER
      1922
      1923
      1924
      1925
      1926
      1927

Final Result:

REFERENCE NUMBER
      1926
      1927

Here is my code:

import pandas as pd


excel1 = 'desktop/lookup.xlsx'
excel2 = 'desktop/mainfile.xlsx'

#Delete top 3 blank rows of main file and last two summary rows
df1 = pd.read_excel(excel1, header=3)
df1 = (df1[:-2])

df2 = pd.read_excel(excel2)


#finds all reference numbers that have the ESBK code
has_code = df1[(df1['DISPLAY CODE'] == 'ESBK022') | (df1['DISPLAY CODE'] == 'ESBK023')
              | (df1['DISPLAY CODE'] == 'ESBK024')]

I've played around a lot with this part. Right now this just results in the table showing only reference numbers that have an ESBK code, and not the other way around.

#copies merged table layout to setup concatenation
merge_codes = pd.merge(has_code, df2, on="REFERENCE NUMBER")

merge_codes.to_excel('Desktop/ExcelTest.xlsx',sheet_name='DATA', index=False)
Holden
  • 23
  • 4

1 Answers1

0

Use:

l=df1.loc[df1['DISPLAY CODE'].isin(['ESBK022','ESBK023','ESBK024']),'REFERENCE NUMBER'].tolist()
df2[~df2['REFERENCE NUMBER'].isin(l)]

Output:

   REFERENCE NUMBER
4              1926
5              1927
ansev
  • 30,322
  • 5
  • 17
  • 31
  • I did a variation of this: ``` has_code = df1[(df1['DISPLAY CODE'] == 'ESBK022') | (df1['DISPLAY CODE'] == 'ESBK023') | (df1['GNC DISPLAY CODE'] == 'ESBK024')] final_table = df2.loc[~df2['REFERENCE NUMBER'].isin(has_code['REFERENCE NUMBER'])] ``` Can you explain how yours works exactly and how I can save the result as its own table? – Holden Dec 04 '19 at 23:06
  • 1
    i'm glad to help:) – ansev Dec 04 '19 at 23:08