This is my second question. I have three very large csv files with similar number of columns but different number of rows in each file. Some of the rows in all three files are similar. I have to get these similar rows from all three files and put them into a new csv file. The example of a csv file is below:
ID Date Text
234212 'Thu Jun 23 04:16:27 +0000 2013' Any Text
234213 'Thu Jun 23 04:16:28 +0000 2013' Any Text
234214 'Thu Jun 23 04:16:29 +0000 2013' Any Text
.......and so on
The ids which are unique in all three files have similar text so we can filter data either based on ID column or by using rows as they have similar data with same IDs. The code for finding similar rows in all three files is below:
import csv
csvInputFile1=open('inputFile1.csv', 'r', encoding="utf-8", newline='')
csvInputFile2=open('inputFile2.csv', 'r', encoding="utf-8", newline='')
csvInputFile3=open('inputFile3.csv', 'r', encoding="utf-8", newline='')
csvOutputFile=open('outputSimilarData.csv', 'w', encoding="utf-8", newline='')
csvReader1 = csv.reader(csvInputFile1)
csvReader2 = csv.reader(csvInputFile2)
csvReader3 = csv.reader(csvInputFile3)
#next(csvReader3)
csvWriter = csv.writer(csvOutputFile)
for row1 in csvReader1:
row2 = next(csvReader2)
row3 = next(csvReader3)
#print(row2)
#print(row3[0])
if row1 != row2 and row1 != row3:
#if row1 not in row and row1 not in row3:
print(row1)
csvWriter.writerow(row1)
#continue
csvOutputFile.close()
csvInputFile3.close()
csvInputFile2.close()
csvInputFile1.close()
I am using file 1 as the first input file because it has the lowest no. of rows and similarly the file 2 has higher no. of rows than file 1 and file no.3 is the largest one. So i am iterating through file 1 and comparing the IDs with other two files and if the ID is there, it should print it and write it into a new csv file.
As we can see that i am using "not equal to" or "not in" in the code and it works perfectly fine and prints all the rows from file 1 that are not in file 2 and file 3 (Uniqe IDs in file 1).
The problem is that when i put "==" or "in" for finding similar IDs, the code does not work and it does not print anything and also nothing is there in the new csv file.
I am not able to solve it and really appreciate if someone could help me out; Thanks...