0

I am very new to Python. I am trying to read a csv file and displaying the result to another CSV file. What I want to do is I want to write selective rows in the input csv file on to the output file. Below is the code I wrote so far. This code read every single row from the input file i.e. 1.csv and write it to an output file out.csv. How can I tweak this code say for example I want my output file to contain only those rows which starts with READ in column 8 and rows which are not equal to 0000 in column 10. Both of these conditions need to be met. Like start with READ and not equal to 0000. I want to write all these rows. Also this block of code is for a single csv file. Can anyone please tell me how I can do it for say 10000 csv files ? Also when I execute the code, I can see spaces between lines on my out csv. How can I remove those spaces ?

import csv
f1 = open("1.csv", "r")
reader = csv.reader(f1)
header = reader.next()
f2 = open("out.csv", "w")
writer = csv.writer(f2)
writer.writerow(header)
for row in reader:
    writer.writerow(row)
f1.close()
f2.close()
Muscles
  • 41
  • 9
  • You should check this post out https://stackoverflow.com/questions/8009882/how-to-read-large-file-line-by-line-in-python. The rest of the questions have most likely also been posted on SO and are just a Google away :). – Landmaster Aug 10 '17 at 01:38

1 Answers1

0

Something like:

import os
import csv
import glob

class CSVReadWriter(object):

    def munge(self, filename, suffix):
        name,ext = os.path.split(filename)
        return '{0}{1}.{2}'.format(name, suffix, ext)

    def is_valid(self, row):
        return row[8] == 'READ' and row[10] == '0000'

    def filter_csv(fin, fout):
        reader = csv.reader(fin)
        writer = csv.writer(fout)

        writer.write(reader.next())  # header
        for row in reader:
            if self.is_valid(row):
                writer.writerow(row)

    def read_write(self, iname, suffix):
        with open(iname, 'rb') as fin:
            oname = self.munge(filename, suffix)
            with open(oname, 'wb') as fout:
                self.filter_csv(fin, fout)

work_directory = r"C:\Temp\Data"

for filename in glob.glob(work_directory):
    csvrw = CSVReadWriter()
    csvrw.read_write(filename, '_out')

I've made it a class so that you can over ride the munge and is_valid methods to suit different cases. Being a class also means that you can store state better, for example if you wanted to output lines between certain criteria.

The extra spaces between lines that you mention are to do with \r\n carriage return and line feed line endings. Using open with 'wb' might resolve it.

Mike Robins
  • 1,733
  • 10
  • 14
  • Hi Mike, Thank You so much. Can you please tell me where in the code should I plug in my filename that is 1.csv ? – Muscles Aug 10 '17 at 14:15
  • In my example I'm using the [glob module](https://docs.python.org/2/library/glob.html#module-glob) to generate a list of all the files that it finds in `C:\Temp\Data`. I suggest you change `work_directory` to something like `muscles\*.csv`. For your testing you could change the for statement to `for filename in ['1.csv']:`. You might also change `munge` around to put the output files into another directory to keep them separate. – Mike Robins Aug 11 '17 at 00:38
  • I has bothered me that the file IO and CSV handling was all mixed in together in one routine. I have refactored the code to separate IO and CSV operations. – Mike Robins Aug 16 '17 at 09:04