0

I'm having trouble explaining my issue to my coworkers so I'm going to try and keep this simple and theoretical here.

I'm using zip() to open a for loop containing two files:

for data1, data2 in itertools.zip_longest(file_1, file_2):

file_1 contains 958 lines and file_2 contains 215 lines. I want to check line by line, if lines 1-958 in file_1 exists in file_2. I'm using itertools.zip_longest() because of the size difference but the way I understand it if the files are unequal length the leftover lines default to none or something you designate.

It seams like zip() sets up a 1 to 1 relationship between files. like you can check a1 to a2 b1 to b2 and so on. I'm looking to check a1 to a2 a1 to b2 a1 to c2 and so on until a1 either finds a match and satisfies or does not and does a thing moving on to b1 doing the same and so on for the 958 lines.

This is how I am imagining it working but it does not work the way I need it to. Is there anything to compare lines of text from two seperate files in the way I'm describing?

from fuzzywuzzy import process
import itertools

file_1 = 958 lines of nasty user input address data
file_2 = 215 lines of clean correct street names from the county


for data1, data2 in itertools.zip_longest(file_1, file_2):
    if data1 not in data2:
         data1 = process.extractOne(data1, data2) # Selects and becomes closest match to fix typo
martineau
  • 119,623
  • 25
  • 170
  • 301
  • `zip()` is for iterating in parallel. If you want a cross product, use nested loops. – Barmar Jan 25 '22 at 22:05
  • Also note that files aren't the same as a list of lines. You need to read all the lines in file1 and check to see if each one it's in each of those from file2 — so again, `zip()` is not the tool needed to do that. – martineau Jan 25 '22 at 22:24

0 Answers0