I'm having trouble explaining my issue to my coworkers so I'm going to try and keep this simple and theoretical here.
I'm using zip()
to open a for loop containing two files:
for data1, data2 in itertools.zip_longest(file_1, file_2):
file_1
contains 958 lines and file_2
contains 215 lines.
I want to check line by line, if lines 1-958 in file_1
exists in file_2
. I'm using itertools.zip_longest() because of the size difference but the way I understand it if the files are unequal length the leftover lines default to none
or something you designate.
It seams like zip()
sets up a 1 to 1 relationship between files. like you can check a1 to a2
b1 to b2
and so on. I'm looking to check a1 to a2
a1 to b2
a1 to c2
and so on until a1
either finds a match and satisfies or does not and does a thing moving on to b1
doing the same and so on for the 958 lines.
This is how I am imagining it working but it does not work the way I need it to. Is there anything to compare lines of text from two seperate files in the way I'm describing?
from fuzzywuzzy import process
import itertools
file_1 = 958 lines of nasty user input address data
file_2 = 215 lines of clean correct street names from the county
for data1, data2 in itertools.zip_longest(file_1, file_2):
if data1 not in data2:
data1 = process.extractOne(data1, data2) # Selects and becomes closest match to fix typo