As a beginner in using Python I'm stuck yet again! I've got two CSV files as follows:
CSV1 (Master List)
ID Name Code. Prop
12 SAB 1234 ZXC
12 SAB 1236 ZXC
12 SAB 1233 ZXC
12 SAB 1234 ZXC
11 ASN 1234 ABV
16 HGF 1233 AAA
19 MAB 8765 BCT
19 MAB 8754 BCT
CSV2 (Subset)
ID Name Code. Prop
12 SAB 1234 ZXC
12 SAB 1236 ZXC
12 SAB 1233 ZXC
12 SAB 1234 ZXC
19 MAB 8765 BCT
19 MAB 8754 BCT
My goal is to try and use the values in the first column of the CSVs to compare and identify those that do not occur in CSV2.
EDIT So in the above example rows with ID 11 and 16 from CSV1 (Master List) should get exported.
Something to consider. The ID although unique has multiple instances in both the CSV files (as demonstrated in the sample data from the csv files above).
I have gone through a few threads on this website such as this one. What I am trying to achieve is the exact opposite of what is asked here but I cannot understand the solution on that thread.
I have attempted to get some results but to no avail. I have attached the code that I am using below:
import csv
fOpen1=open('C:\Master.csv')
fOpen2=open('C:\Subset.csv')
fOutput1=open('C:\Untagged.csv', 'wb')
master=csv.reader(fOpen1)
subset=csv.reader(fOpen2)
untagged=csv.writer(fOutput1)
count=0
subsetCopy=list()
header1=master.next()
header2=subset.next()
untagged.writerow(header1)
for row2 in subset:
subsetCopy.append(row2)
for row1 in master:
for row2 in subsetCopy:
if row1[0] != row2[0]:
count=count+1
untagged.writerow(row1)
print count
When I run this I get very absurd results in the order of millions (count). The weird thing is I used this exact same code without != (used == instead) to achieve another goal and it worked like a charm. I thought changing the equality condition should give me the opposite result. Instead it ends up producing a huge csv file with nothiing useful. I also tried to use a dictionary but then realised it may not work because of duplication of records in both the files. It is important for me to get all the instances of a particular row in both the files.
Where am I going wrong? Any advice/suggestions are welcome.