Need to do a math operation on every line in several CSV files in Python

Question

I have about 100 CSV files I have to operate on once a month and I was trying to wrap my head around this but I'm running into a wall. I'm starting to understand some things about Python, but combining several things is still giving me issues, so I can't figure this out.

Here's my problem:

I have many CSV files, and here's what I need done:

add a "column" to the front of each row (or the back, doesn't matter really, but front is ideal). In addition, each line has 5 rows (not counting the filename that will be added), and here's the format:

6-digit ID number,YYYY-MM-DD(1),YYYY-MM-DD(2),YYYY-MM-DD(3),1-2-digit number

I need to subtract YYYY-MM-DD(3) from YYYY-MM-DD(2) for every line in the file (there is no header row), for every CSV in a given directory.

I need the filename inside the row because I will combine the files (which, if is included in the script would be awesome, but I think I can figure that part out), and I need to know what file the records came from. Format of filename is always '4-5-digit-number.csv'

I hope this makes sense, if it does not, please let me know. I'm kind of stumped as to where to even begin, so I don't have any sample code that even really began to work for me. Really frustrated, so I appreciate any help you guys may provide, this site rocks!

Mylan

I don't understand why you would need the filename inside the file itself. When you open the file, you know its name. — Tim Pietzcker, Sep 08 '11 at 15:54
I will ultimately be combining all files into one, so I will need to know which row comes from what file. Each file basically outlines some data from a particular facility, and I need to know which facility that is for, so that's why I wanted the filename in each row. And I did mean column, sorry. — mrmylanman, Sep 08 '11 at 16:17

Tim Pietzcker · Accepted Answer · 2011-09-08T16:25:22.870

7

There's a tool in the standard library for each of these tasks:

To iterate over all CSV files in a directory, use the glob module:

import glob
for csvfilename in glob.glob(r"C:\mydirectory\*.csv"):
    #do_something

To parse a CSV file, use the csv module:

import csv
with open(csvfilename, "rb") as csvfile:
    reader = csv.reader(csvfile, delimiter=",")
    for row in reader:
        # row is a list of all the entries in the current row

To parse a date and calculate a difference, use the datetime module:

from datetime import datetime
startdate = datetime.strptime("1999-10-20", "%Y-%m-%d")
enddate = datetime.strptime("2003-02-28", "%Y-%m-%d")
delta = enddate - startdate # difference in days

To add a value to the beginning of a row:

row[0:0] = [str(delta)]

To append the filename to the end of a row:

row.append(csvfilename)

And to write a row to a new CSV file:

with open(csvfilename, "wb") as csvfile:
    writer = csv.writer(csvfile, delimiter=",")
    writer.writerow(row)

Taken all together, you get:

import glob
import csv
from datetime import datetime

with open("combined_files_csv", "wb") as outfile:
    writer = csv.writer(outfile, delimiter=",")
    for csvfilename in glob.glob(r"C:\mydirectory\*.csv"):
        with open(csvfilename, "rb") as infile:
            reader = csv.reader(infile, delimiter=",")
            for row in reader:
                startdate = datetime.strptime(row[3], "%Y-%m-%d")
                enddate = datetime.strptime(row[2], "%Y-%m-%d")
                delta = enddate - startdate # difference in days
                row[0:0] = [str(delta)]
                row.append(csvfilename)
                writer.writerow(row)

edited Sep 08 '11 at 16:25

answered Sep 08 '11 at 16:14

Tim Pietzcker

328,213
58
503
561

2

Awesome, thank you for these tips, I greatly appreciate the direction – mrmylanman Sep 08 '11 at 16:18
I tried out your script, and it seems like it does exactly what I need, however when it gets to line 37 ("for row in reader:"), it throws the error "_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)". I am using Mac OS 10.6 and Python 3.2.1 if that makes a difference. Thanks – mrmylanman Sep 08 '11 at 16:57
Actually I got it going now. I overlooked changing "combined_files_csv" from a string to an actual file. Works like a charm now, thanks a ton! This also gives me insight into how to do other tasks I have to do on CSVs regularly. – mrmylanman Sep 08 '11 at 17:15
Ah, yes; in Python 3, the csv module works a little differently; you don't open the files in binary mode, and there is an additional parameter `newline`; see [the docs](http://docs.python.org/py3k/library/csv.html). So you would do `open("file.csv", "w", newline = "")` and `writer = csv.writer(outfile, delimiter = "")`. It's a good idea to use the `python-3.x` tag in addition to the `python` tag because without it, most people assume you're talking about Python 2. – Tim Pietzcker Sep 08 '11 at 17:28
Thanks again, I'm still trying to get used to the differences of the two. I figured it'd be best for future-proofing myself to work in Python3, but it's a little tricky sometimes as most of the information I find that I reference is written for 2.x and I have to change it to work with 3.x. Thanks for all the help! – mrmylanman Sep 08 '11 at 17:37

score 0 · Answer 2 · answered Sep 08 '11 at 15:57

0

The basic outline of the program is going to be like this:

Use the os module to get the filenames out of the directory/directories of interest
Read in each file one at a time
For each line in the file, split it into columns with columns = line.split(",")
Use datetime.date to convert strings like "2011-05-03" to datetime.dates.
Subtract the third date from the second, which yields a datetime.timedelta.
Put all your information in the format you want (hint: str(foo) yields a string representation of foo, for just about any type) and remember it for later
Close your file, reopen it for writing, and write your new stuff in

answered Sep 08 '11 at 15:57

Vanessa MacDougal

982
6
14

I'd suggest replacing step 7 with `write your new stuff out to a temporary file`. Then `8. Rename original file to .bak suffixed name`. `9. Rename temporary file to original filename`. So as to minimise risk of losing data. – MattH Sep 08 '11 at 16:17
Thank you very much, this is a big help – mrmylanman Sep 08 '11 at 16:18

Need to do a math operation on every line in several CSV files in Python

2 Answers2

Linked