I'm fairly new at programming and I am trying to write a python program that will compare 2 .csv files by specific columns and check for additions, removals, and modifications. The .csv files are both in the following format, contain the same amount of columns, and use BillingNumber as the key:
BillingNumber,CustomerName,IsActive,IsCreditHold,IsPayScan,City,State
"2","CHARLIE RYAN","Yes","No","Yes","Reading","PA"
"3","INSURANCE BILLS","","","","",""
"4","AAA","","","","",""
I need to compare only columns 0, 1, 2, and 4. I have tried many different ways to accomplish this but I haven't had any luck. I understand that I can load them into dictionaries using csv.DictReader
or csv.reader
, but after that I get stuck. I'm not sure exactly where or how to start after loading them into memory.
I tried this previously:
import time
old_lines = set((line.strip() for line in open(r'Old/file1.csv', 'r+')))
file_new = open(r'New/file2.csv', 'r+')
choice = 0
choice = int( input('\nPlease choose your result format.\nEnter 1 for .txt, 2 for .csv or 3 for .json\n') )
time.sleep(1)
print(".")
time.sleep(1)
print("..")
time.sleep(1)
print("...")
time.sleep(1)
print("....")
time.sleep(1)
print('Done! Check "Different" folder for results.\n')
if choice == 1:
file_diff = open(r'Different/diff.txt', 'w')
elif choice == 2:
file_diff = open(r'Different/diff.csv', 'w')
elif choice == 3:
file_diff = open(r'Different/diff.json', "w")
else:
print ("You MUST enter 1, 2 or 3")
exit()
for line in file_new:
if line.strip() not in old_lines:
file_diff.write("** ERROR! Entry "+ line + "** Does not match previous file\n\n")
file_new.close()
file_diff.close()
It doesn't work properly because if there is an additional line, or one is missing, it logs everything after that line as different. Also it compares the whole line which is not what I want to do. This was basically just a starting point and although it kind of worked, it isn't specific enough for what I need. I'm really just looking for a good place to start. Thanks!