I want to compare two sets of data and find matches. I have one file (file1.csv) 1400 lines that looks like this:
ID
123
456
789
145
165
175
185
195
I have another file (file2.csv) around 46,000 lines that looks like this:
accountNumber, user, accountNumber2, address
456,example@email.com,0,1 Lane
001,example1@email.com,175,2 Lane
002,example2@email.com,789,3 Lane
195,example3@email.com,0,4 Lane
123,example4@email.com,0,5 Lane
689,example5@email.com,0,6 Lane
003,example6@email.com,0,7 Lane
004,example7@email.com,0,8 Lane
I want use file1 ID's to match up with emails in file2. The output I would like is a new file(file3) like:
ID,Email
123, example4@email.com
456, example@email.com
789, example2@email.com
145, Not found
165, Not found
175, example1@email.com
185, Not found
195, example3@email.com
Here's what I have tried:
import pandas as pd
file1_df = pd.read_csv('file1.csv')
file2_df = pd.read_csv('file2.csv')
def search():
for account_id in file1_df['account_id']:
print("ID: ",account_id)
id_loc = file1_df[file1_df['account_id'] == account_id].index.values
print("id_loc",id_loc)
try:
accountNumber = file2_df[file2_df['accountNumber'] == account_id]['user'].values[0]
print(accountNumber)
accountNumber_loc = file2_df.loc[file2_df['accountNumber'] == account_id].index.values
print(account_id, "Found at: ", accountNumber_loc)
file1_df.loc[id_loc, "Located"] = accountNumber
except Exception:
pass
try:
accountNumber2 = file2_df[file2_df['accountNumber2'] == account_id]['user'].values[0]
print(accountNumber2)
accountNumber2_loc = file2_df.loc[file2_df['accountNumber2'] == account_id].index.values
print(account_id, "Found at: ", accountNumber2_loc)
file1_df.loc[id_loc, "Located"] = accountNumber2
except Exception:
print("Not Found")
file1_df.loc[id_loc, "Located"] = "Not found"
search()
file1_df.to_csv('file3.csv')
I keep getting the error:
IndexError: index 0 is out of bounds for axis 0 with size 0
It seems like it almost works with small files but once I try with the real version I just keep getting the IndexError. Is there a better way to find these matches?