0

I have two csv files with a single column of data. How can I remove data in the second csv file in-place by comparing it with the data in the first csv file? For example:

 import csv
 reader1 = csv.reader(open("file1.csv", "rb")) 
 reader = csv.reader(open("file2.csv", "rb"))f
 for line in reader:
     if line in reader1:
         print line

File 2

Arjun
  • 1,259
  • 1
  • 13
  • 25
abdo
  • 23
  • 10
  • Possible duplicate of [Deleting columns in a CSV with python](http://stackoverflow.com/questions/7588934/deleting-columns-in-a-csv-with-python) – Razik Aug 25 '16 at 15:56

1 Answers1

0

if both files are just single columns, then you could use set to remove the differences. However, this presumes that the entries in each file do not need to be duplicated and their order doesn't really matter.

#since each file is a column, unroll each file into a single list:
dat1 = [x[0] for x in reader1]
dat2 = [y[0] for y in reader]

#take the set difference
dat1_without_dat2 = set(dat1).difference(dat2)
Gene Burinsky
  • 9,478
  • 2
  • 21
  • 28
  • the result is exception Traceback (most recent call last): File "C:/Users/cvs.py", line 8, in dat1 = [x[[0]] for x in reader1] TypeError: list indices must be integers, not list – abdo Aug 25 '16 at 16:02
  • 1
    should be fixed now, sorry, i was just helping someone with `R` hence `x[[0]]` which is wrong. `x[0]` , `y[0]`, should work – Gene Burinsky Aug 25 '16 at 16:23