4

I'm trying to figure out how to make python go through a directory full of csv files, process each of the files and spit out a text file with a trimmed list of values.

In this example, I'm iterating through a CSV with lots of different types of columns but all I really want are the first name, last name, and keyword. I have a folder full of these csvs with different columns (except they all share first name, last name, and keyword somewhere in the csv). What's the best way to open that folder, go through each csv file, and then spit it all out as either its own csv file for just a text list as I have in the example below.

import csv
reader = csv.reader(open("keywords.csv"))
rownum = 0
headnum = 0
F = open('compiled.txt','w')
for row in reader:
    if rownum == 0:
        header = row;
        for col in row:
            if header[headnum]=='Keyword':
                keywordnum=headnum;
            elif header[headnum]=='First Name':
                firstnamenum=headnum;
            elif header[headnum]=='Last Name':
                lastnamenum=headnum;
            headnum +=1
    else:
        currentrow=row
        print(currentrow[keywordnum] + '\n' + currentrow[firstnamenum] + '\n' + currentrow[lastnamenum]) 
        F.write(currentrow[keywordnum] + '\n')

    rownum +=1
Imran
  • 41
  • 1
  • 2

5 Answers5

8

The best way is probably to use the shell's globbing ability, or alternatively the glob module of Python.

Shell (Linux, Unix)

Shell:

python myapp.py folder/*.csv

myapp.py:

import sys
for filename in sys.argv[1:]:
    with open(filename) as f:
        # do something with f

Windows (Or no shell available.)

import glob
for filename in glob.glob("folder/*.csv"):
    with open(filename) as f:
        # do something with f

Note: Python 2.5 needs from __future__ import with_statement

Georg Schölly
  • 124,188
  • 49
  • 220
  • 267
  • 1
    Note about using globs -- OS X returns the set sorted alphabetically, Linux returns it in no particular order. May not matter but good to know. – jpsimons Jan 02 '10 at 20:47
4

The "get all the CSV files" part of the question has been answered several times (including by the OP), but the "get the right named columns" hasn't yet: csv.DictReader makes it trivial -- the "process one CSV file" loop becomes just:

reader = csv.DictReader(open(thecsvfilename))
for row in reader:
    print('\n'.join(row['Keyword'], row['First Name'], row['Last Name'])) 
    F.write(row['Keyword'] + '\n')
Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
1

A few suggestions:

  • You could keep the header indices for Keyword, First Name, and Last Name in a map instead of using separate variables. This would make it easier to modify the script later on.

  • You could use the list index() function instead of looping over the headers, e.g.:

    if rownum == 0:
        for header in ('Keyword', 'First Name', 'Last Name'):
            header_index[header] = row.index(header)
    
  • You could use the glob module to grab the filenames, but gs is probably right that shell globbing is a better way to do it.

  • It might be better to use the csv module for writing the file as well; I think it handles escaping, so it would probably be more robust.

p-static
  • 529
  • 1
  • 4
  • 9
1

I think the best way to process a bunch of files in a directory is with os.walk (documented in the Python os module docs here.

Here is an answer I wrote to another Python question, which includes working tested Python code to use os.walk to open a bunch of files. This version visits all subdirectories too, but it would be easy to modify it to just stay in the one directory.

Replace strings in files by Python

Community
  • 1
  • 1
steveha
  • 74,789
  • 21
  • 92
  • 117
0

And I've answered my own question again... I imported the os and glob modules to nab a path.

Imran
  • 41
  • 1
  • 2