0

I'm trying to compare two excel spreadsheets, remove the names that appear in both spreadsheets from the first spreadsheet and then export it to a csv file using python. I am new, but here's what I have so far:

import pandas as pd
data_1 = pd.read_excel (r'names1.xlsx')
bit_data = pd.DataFrame(data_1, columns= ['Full_Name'])
bit_col = len(bit_data)
data_2 = pd.read_excel (r'force_names.xlsx')

force_data = pd.DataFrame(data_2, columns= ['FullName'])

force_col = len(force_data)


for bit_num in range(bit_col):

  for force_num in range(force_col):

        if bit_data.iloc[bit_num,0] == force_data.iloc[force_num,0]:

            data_1 = data_1.drop(data_1.index[[bit_num]])

            data_1.to_csv(r"/Users/name/Desktop/Reports/Names.csv")


                   

When I run it it gets rid of some duplicates but not all, any advice anyone has would be greatly appreciated.

akejay123
  • 1
  • 1
  • And what problem are you facing ? – Sandeep Sharma Jul 01 '21 at 18:11
  • Have a look at [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and [edit] your question to include a [mcve] showing sample input dataframes, current output, and expected output – G. Anderson Jul 01 '21 at 18:17
  • Also note that pandas has builtin `merge()`, `isin()`, and `drop_duplicates()` functions that could be helpful here – G. Anderson Jul 01 '21 at 18:18

1 Answers1

0

Use pandas merge to get all unique names, with no duplicates. If you want to drop any names that are in both files (I'm not sure if that's what you're asking), you can do so. See this toy example:

row1list = ['G. Anderson']
row2list = ['Z. Ebra']
df1 = pd.DataFrame([row1list, row2list], columns=['FullName'])

row1list = ['G. Anderson']
row2list = ['C. Obra']
df2 = pd.DataFrame([row1list, row2list], columns=['FullName'])

df3 = df1.merge(df2, on='FullName', how='outer', indicator=True)

print(df3)
#       FullName      _merge
# 0  G. Anderson        both
# 1      Z. Ebra   left_only
# 2      C. Obra  right_only

df3 = df3.loc[df3['_merge'] != 'both']
del df3['_merge']

print(df3)
#   FullName
# 1  Z. Ebra
# 2  C. Obra

Hammurabi
  • 1,141
  • 1
  • 4
  • 7