I would like to compare two columns between two CSVs, find partial string matches, then append all data from the matching line (including equivalent within columns) at the correct index in the first CSV, then output the compiled matches in a third CSV, without editing either of the first two CSVs. All lines in the first CSV must appear on the third, whether or not matches were found, but only matching lines from the second CSV. For example:
CSV_1 CSV_2
Name City Date Name_of_thing City_of_Origin Time
Examp. Bton 7/11 huh, inc. Lton, AMERICA 7/10/2020 00:00
Nomatch Cton 10/10 THE EXAMPLE, LLC Bton, USA 7/11/2020 00:00
Would appear as
CSV_3
Name City Date Name_of_thing City_of_Origin Time
Examp. Bton 7/11 THE EXAMPLE, LLC Bton, USA 7/11/2020 00:00
Nomatch Cton 10/10
I posted earlier here asking about a fuzzy matching approach (Record linking two large CSVs in Python?), but the solution ended up being very time-inefficient, so I'm hoping a simple regex search will be sufficient. These files are both very large and not the same length, so I'm wondering if there's a solution which allows all entries in CSV_1 to be compared against those in CSV_2 in an efficient way.
Unfortunately, the constraints of the project do not allow me to load the CSVs into a database, which would make this much easier, so I need to rely entirely upon Pandas for this.
Thanks!