1

I know how to remove lines in a CSV file, however looking at removing multiple lines in multiple CSV files.

that's my code:

import csv
import os
import glob

myfiles = glob.glob('*.csv',recursive=False)

with open(myfiles, 'r') as fin:
    data = fin.read().splitlines(True)
with open(myfiles, 'w') as fout:
    fout.writelines(data[5:])

I want to achieve the following: 1) Iterate through current directory. 2) Remove first 4 lines in a CSV file and save it.

xax0
  • 23
  • 1
  • 3
  • Check your loops. You iterate over the files first and re-define data every time. Then with the **last** definition oft `data`, you run your second loop. You want to do this in one loop with `rw` or (better) create a new file with the reduced content.. – moestly Apr 17 '17 at 12:42
  • How big are the files? readlines() loads all data into the memory – oshaiken Apr 17 '17 at 12:45

2 Answers2

4

This answer looks helpful. Here is a slight tweak of it to handle multiple files:

import glob

myfiles = glob.glob('*.csv')
for file in myfiles:
    lines = open(file).readlines()
    open(file, 'w').writelines(lines[3:])
Community
  • 1
  • 1
mgig
  • 2,395
  • 4
  • 21
  • 36
0

If you want to use python you can do this.

import csv
import os

def read_csv(inputfile, outputfile):
     try:

        with open(inputfile, 'r') as csvfile:

            file = csv.reader(csvfile, delimiter='|', lineterminator = '\n')
            for i, line in enumerate(file):
            if i > 3:
                write_csv(line,outputfile)

     except IOError:
            print "IOError in ", inputfile


def write_csv(w_list, outputfile):
    with open(outputfile, 'a') as f:
        writer = csv.writer(f, delimiter = '|', lineterminator = '\n')
        writer.writerows(w_list)


def main():

    indir = 'path to dir with csv'


    for root, dirs, filenames in os.walk(indir):
        for f in filenames:
            filename = os.path.join(root , f)
            if '.csv' in filename:
                read_csv(filename, outputfile='output' +filename)


if __name__=="__main__":
    main()

Or you can use:

tail -n +4 original.csv > original-4-top-lines.csv
oshaiken
  • 2,593
  • 1
  • 15
  • 25