Selecting lines from many csv files and creating new files

Question

I have folder with hundreds of csv files, with one row in every file containing country names. I would like to loop through all the files, select the lines with country name "FIN" and create new csv files from the selected lines.

This is how far I have gotten:

import csv
import glob

for filename in glob.glob('\directory\*.csv'):
with open(filename, 'r') as i, open('\directory_for_new_files\fin_{}'.format(filename), 'w') as o:
   r = csv.reader(i, delimiter=',')
   w = csv.writer(o, delimiter=',')
   for row in r:
      if 'FIN' in row[3] or 'flag' in row[3] :
          w.writerow(row)

The "fin_{}'.format(filename)" part seems to be the problem, since when I replace it with just a name (like 'testfile.csv') the script works, but of course with the problem that it constantly overwrites the same file. So how do I get the script to create a new output file for every input file?

Error message:

with open(r'D:\Koko Suomen ihmispaineet\Ihmispaineet_26_10_2018\Global fishing watch\fishing_effort\daily_csvs_finland\fin_{}'.format(filename), 'w') as o:

IOError: [Errno 22] invalid mode ('w') or filename: 'D:\Koko Suomen ihmispaineet\Ihmispaineet_26_10_2018\Global fishing watch\fishing_effort\daily_csvs_finland\fin_D:\Koko Suomen ihmispaineet\Ihmispaineet_26_10_2018\Global fishing watch\fishing_effort\daily_csvs\2012-01-01.csv'

Do your errors disappear when you use [the correct number of backslashes](https://stackoverflow.com/q/19065115/2564301)? — Jongware, Nov 07 '18 at 14:01

Patrick Artner · Accepted Answer · 2018-11-07T14:25:52.177

3 Problems:

wrong slashes: you need to use either
- raw strings with backslashes r"\somedir\somefi.le" or
- escape backslashes: "\\somedir\\somefi.le"
- or use slashes instead - they "simply work": /somedir/somefi.le
you forgot to specify newline="" when writing the csv
glob returns filenames with path - you only need the filenames.

Fixed example:

import csv
import glob

# create demo files
for k in "abc":
    with open("./{}.csv".format(k),"w") as f:
        f.write(k+",b,c,FIN,d\n")
        f.write(k+",b,c,not,d\n")
        f.write(k+",b,c,flag,d\n")


# import / read / create new files:
import os    

extract = {'FIN','flag'}

for filename in glob.glob('./*.csv'):
    _, fn = os.path.split(filename)                            # fix here
    with open(filename, 'r') as i, \
         open('./fin_{}'.format(fn), 'w', newline="") as o:    # 2 fixes here
        r = csv.reader(i, delimiter=',')
        w = csv.writer(o, delimiter=',')
        for row in r:
            if row[3] in extract:                              # improvement
                w.writerow(row)    


# test creation and content
for filename in glob.glob('./*.csv'):
    print(filename)
    with open(filename) as f:
        print(f.read())
    print("------------")

Output:

./a.csv
a,b,c,FIN,d
a,b,c,not,d
a,b,c,flag,d

------------
./b.csv
b,b,c,FIN,d
b,b,c,not,d
b,b,c,flag,d

------------
./c.csv
c,b,c,FIN,d
c,b,c,not,d
c,b,c,flag,d

------------
./fin_a.csv
a,b,c,FIN,d
a,b,c,flag,d

------------
./fin_b.csv
b,b,c,FIN,d
b,b,c,flag,d

------------
./fin_c.csv
c,b,c,FIN,d
c,b,c,flag,d    

------------

Selecting lines from many csv files and creating new files

1 Answers1