I have a Python Script that generate a CSV (data parsed from a website). Here is an exemple of the CSV file:
File1.csv
China;Beijing;Auralog Software Development (Deijing) Co. Ltd.;;;
United Kingdom;Oxford;Azad University (Ir) In Oxford Ltd;;;
Italy;Bari;Bari, The British School;;Yes;
China;Beijing;Beijing Foreign Enterprise Service Group Co Ltd;;;
China;Beijing;Beijing Ying Biao Human Resources Development Limited;;Yes;
China;Beijing;BeiwaiOnline BFSU;;;
Italy;Curno;Bergamo, Anderson House;;Yes;
File2.csv
China;Beijing;Auralog Software Development (Deijing) Co. Ltd.;;;
United Kingdom;Oxford;Azad University (Ir) In Oxford Ltd;;;
Italy;Bari;Bari, The British School;;Yes;
China;Beijing;Beijing Foreign Enterprise Service Group Co Ltd;;;
China;Beijing;Beijing Ying Biao Human Resources Development Limited;;Yes;
This;Is;A;New;Line;;
Italy;Curno;Bergamo, Anderson House;;Yes;
As you can see,
China;Beijing;BeiwaiOnline BFSU;;; ==> This line from File1.csv is not more present in File2.csv and This;Is;A;New;Line;; ==> This line from File2.csv is new (is not present in File1.csv).
I am looking for a way to compare this two CSV files (one important thing to know is that the order of the lines doesn't count ... they cant be anywhere).
What I'd like to have is a script which can tell me: - One new line : This;Is;A;New;Line;; - One removed line : China;Beijing;BeiwaiOnline BFSU;;; And so on ... !
I've tried but without any success:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import csv
f1 = file('now.csv', 'r')
f2 = file('past.csv', 'r')
c1 = csv.reader(f1)
c2 = csv.reader(f2)
now = [row for row in c2]
past = [row for row in c1]
for row in now:
#print row
lol = past.index(row)
print lol
f1.close()
f2.close()
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
Any idea of the best way to proceed ? Thank you so much in advance ;)
EDIT:
import csv
f1 = file('now.csv', 'r')
f2 = file('past.csv', 'r')
c1 = csv.reader(f1)
c2 = csv.reader(f2)
s1 = set(c1)
s2 = set(c2)
lol = s1 - s2
print type(lol)
print lol
This seems to be a good idea but :
Traceback (most recent call last):
File "compare.py", line 20, in <module>
s1 = set(c1)
TypeError: unhashable type: 'list'
EDIT 2 (Please don't care about what is above): *with your help, here is the script I'm writing :*
#!/usr/bin/python
# -*- coding: utf-8 -*-
import os
import csv
### COMPARISON THING ###
x=0
fichiers = os.listdir('/me/CSV')
for fichier in fichiers:
if '.csv' in fichier:
print('%s -----> %s' % (x,fichier))
x=x+1
choice = raw_input("Which file do you want to compare with the new output ? ->>>")
past_file = fichiers[int(choice)]
print 'We gonna compare %s to our output' % past_file
s_now = frozenset(tuple(row) for row in csv.reader(open('/me/CSV/now.csv', 'r'), delimiter=';')) ## OUR OUTPUT
s_past = frozenset(tuple(row) for row in csv.reader(open('/me/CSV/'+past_file, 'r'), delimiter=';')) ## CHOOSEN ONE
added = [";".join(row) for row in s_now - s_past] # in "now" but not in "past"
removed = [";".join(row) for row in s_past - s_now] # in "past" but not in "now"
c = csv.writer(open("CHANGELOG.csv", "a"),delimiter=";" )
line = ['AD']
for item_added in added:
line.append(item_added)
c.writerow(['AD',item_added])
line = ['RM']
for item_removed in removed:
line.append(item_removed)
c.writerow(line)
Two kind of errors:
File "programcompare.py", line 21, in <genexpr>
s_past = frozenset(tuple(row) for row in csv.reader(open('/me/CSV/'+past_file, 'r'), delimiter=';')) ## CHOOSEN ONE
_csv.Error: line contains NULL byte
or
File "programcompare.py", line 21, in <genexpr>
s_past = frozenset(tuple(row) for row in csv.reader(open('/me/CSV/'+past_file, 'r'), delimiter=';')) ## CHOOSEN ONE
_csv.Error: newline inside string
It was working few minutes ago but I've changed the CSV files to test with different datas and here I am :-)
Sorry, last question !